Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java.Interop.Tools.Maven] Initial commit. #1179

Merged
merged 8 commits into from
Mar 11, 2024
Merged

Conversation

jpobst
Copy link
Contributor

@jpobst jpobst commented Jan 12, 2024

Create a new Java.Interop.Tools.Maven.dll assembly that contains functionality for:

  • Retrieving artifacts from Maven
  • Parsing and understanding POM files

This assembly is used by dotnet/android#8649 and replaces our previous usage of MavenNet.dll.

@jpobst jpobst force-pushed the java-interop-maven branch from 64b12e1 to c1b64bb Compare January 12, 2024 20:26
@jpobst jpobst force-pushed the java-interop-maven branch 2 times, most recently from 8178ac5 to 1bb76b2 Compare January 24, 2024 22:36
@jpobst jpobst changed the title [Java.Interop.Maven] Initial commit. [Java.Interop.Tools.Maven] Initial commit. Jan 24, 2024
@jpobst jpobst force-pushed the java-interop-maven branch from 1bb76b2 to cf1efa0 Compare January 24, 2024 22:52
@jpobst jpobst force-pushed the java-interop-maven branch 2 times, most recently from 2ac4f22 to d7be7c4 Compare February 6, 2024 17:38
{
foreach (var property_set in stack) {
foreach (var prop in property_set)
value = value.Replace ($"${{{prop.Key}}}", prop.Value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me some concern, if only because this could produce lots of garbage, all depending on stack.

I think I'd prefer a "optimization check" + use of StringBuilder.Replace():

if (stack.Count == 0 || !value.Contains("${") {
    return value;
}
var s = new StringBuilder (value);
foreach (var property_set in stack) {
    foreach (var prop in property_set) {
        s.Replace ($"${{{prop.Key}}}", prop.Value);
    }
}
return s.ToString ();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the optimization check. 👍

From my research looking at POM files, this code will rarely be hit with the optimization check, and when it is hit it will generally be for small numbers of properties. The general guidance seems to be that string.Replace is faster than StringBuilder.Replace for small numbers of replaces, despite the extra garbage, as it avoids the overhead of creating the StringBuilder and converting it ToString at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall reading the guidance about "string.Replace is faster … for small numbers of replace". Would be interesting to read more.

@jonpryor
Copy link
Member

Convention elsewhere is to use src/[Assembly Base Name]/[Dotted Namespace Name]/[Type Name].cs.

We should thus have:

  • src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven/DefaultPomResolver.cs
  • src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.Extensions/MavenNetExtensions.cs
  • etc.

I also wonder if the "sub-namespaces" are being too granular here; do we really need .Extensions, .Models, .Repositories? You have a naming convention of *Repository as a suffix, and there's not that many types here, so would it be "bad" to just have them all in the Java.Interop.Tools.Maven namespace?

[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlRootAttribute("project")]
public partial class Project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…then again, maybe we want generated code to go into its own sub-namespace…?


// https://github.com/mganss/XmlSchemaClassGenerator
// This code was generated by XmlSchemaClassGenerator version 2.1.1057.0 using the following command:
// xscgen maven-4.0.0.xsd --nullable --nc --nr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is maven-4.0.0.xsd? Should that be added to this project for "reference" purpsoes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add it to our repo, but adding a URL for it to the comment seems like a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a generated file, should we move this comment block & related into Project.Partial.cs, so that we don't have to manually update this file whenever we regenerate it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, I imagine you didn't build mganss/XmlSchemaClassGenerator yourself, but instead used a NuGet package such as https://www.nuget.org/packages/XmlSchemaClassGenerator.Console. Would be useful to put this into a .targets file to make updating easier.


public interface IMavenRepository
{
// This name is used for on-disk caching purposes, and thus must be compatible with file system naming rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this means that directory names can also be included or not. Is foo/bar.xml valid?

If directories names can't be included, perhaps this should be named FileName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we need to place each Maven repository in its own directory in our cache to prevent collisions if they contain the same Java artifact. In this example, it is "central" or "google":

- /MavenCacheDirectory
  -  /central/androidx.core/core/1.9.0/core-1.9.0.aar
  -  /google/androidx.core/core/1.9.0/core-1.9.0.aar

I will try to describe it a little better in the comment.

Name = name;
base_url = baseUrl;

client = new HttpClient {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General guidelines for HttpClient usage is that HttpClient instances should either be long-lived and shared (and static) or through IHttpClientFactory (though I'm not quite sure what that actually looks like in practice…).

newing up and ~immediately disposing of HttpClient instances is not a good idea.

See also: https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-net-http-httpclient#instancing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: HttpClient.BaseAddress is a read+write property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't have been too bad, as MavenRepository itself is generally expected to be long-lived, but switched to using a static HttpClient to be better.

return true;
}

public static MavenRepository Google = new MavenRepository ("https://dl.google.com/android/maven2/", "google");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be readonly, as I can't imagine a scenario where someone should be able to do MavenRepository.Google = someOtherValue.


public static MavenRepository Google = new MavenRepository ("https://dl.google.com/android/maven2/", "google");

public static MavenRepository Central = new MavenRepository ("https://repo1.maven.org/maven2/", "central");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be readonly.

/// <summary>
/// Wraps an <see cref="IMavenRepository"/> and caches files in a local directory.
/// </summary>
public class CachedMavenRepository : IMavenRepository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming question: I think this should be named MavenRepositoryCache.

Rationale/5 minutes of investigation: consider dotnet/runtime, commit dotnet/runtime@4765dd1:

# class contains `Cache`
% git grep 'public.*class.*Cache' src/libraries | wc -l 
      83

# class uses `Cached`
% git grep 'public.*class.*Caching' src/libraries | wc -l
      1

# class uses `Caching`
% git grep 'public.*class.*Caching' src/libraries | wc -l
      10

# class ends with `Cache`
% git grep 'public.*class.*Cache\>' src/libraries | wc -l
      20

Type names containing Cache include:

  • CacheEntryExtensions
  • MemoryCacheEntryExtensions

These don't feel like a good pattern to me, even if this has the most matches.

Cached has one match:

  • XmlCachedSchemaSetResolver

Cache as a suffix feels better:

  • MemoryDistributedCache
  • RuleCache<T>
  • CredentialCache

Caching, meanwhile, only shows up as sub-namespaces:

src/libraries/Microsoft.Extensions.Caching.Memory/ref/Microsoft.Extensions.Caching.Memory.cs:    public partial class MemoryDistributedCache : Microsoft.Extensions.Caching.Distributed.IDistributedCache
src/libraries/Microsoft.Extensions.Caching.Memory/ref/Microsoft.Extensions.Caching.Memory.cs:    public partial class MemoryCache : Microsoft.Extensions.Caching.Memory.IMemoryCache, System.IDisposable
src/libraries/Microsoft.Extensions.Caching.Memory/ref/Microsoft.Extensions.Caching.Memory.cs:    public partial class MemoryCacheOptions : Microsoft.Extensions.Options.IOptions<Microsoft.Extensions.Caching.Memory.MemoryCacheOptions>
src/libraries/Microsoft.Extensions.Caching.Memory/ref/Microsoft.Extensions.Caching.Memory.cs:    public partial class MemoryDistributedCacheOptions : Microsoft.Extensions.Caching.Memory.MemoryCacheOptions
src/libraries/Microsoft.Extensions.Diagnostics/tests/DefaultMetricsFactoryTests.cs:        public class NoCachingMeterFactory : IMeterFactory
src/libraries/System.Linq.Expressions/tests/Dynamic/CallSiteCachingTests.cs:    public class CallSiteCachingTests
src/libraries/System.Runtime.Caching/ref/System.Runtime.Caching.cs:    public abstract partial class CacheEntryChangeMonitor : System.Runtime.Caching.ChangeMonitor
src/libraries/System.Runtime.Caching/ref/System.Runtime.Caching.cs:    public abstract partial class FileChangeMonitor : System.Runtime.Caching.ChangeMonitor
src/libraries/System.Runtime.Caching/ref/System.Runtime.Caching.cs:    public sealed partial class HostFileChangeMonitor : System.Runtime.Caching.FileChangeMonitor
src/libraries/System.Runtime.Caching/ref/System.Runtime.Caching.cs:    public partial class MemoryCache : System.Runtime.Caching.ObjectCache, System.Collections.IEnumerable, System.IDisposable

So Caching feels like a bad idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MavenRepositoryCache implies that this is a cache of MavenRepository, which it is not.

It is a wrapper around a MavenRepository which adds caching features to it.

It feels like CachedMavenRepository is better as MavenRepository is the noun and Cached (or Caching) is the adjective describing the modification to the MavenRepository.

using (var sw = File.Create (file))
repo_stream!.CopyTo (sw);

using (repo_stream) { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird.

I think this would feel better:

using (var sw = File.Create (file))
using (repo_stream) {
    repo_stream.CopyTo (sw);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that is better.


if (repository.TryGetFile (artifact, filename, out var repo_stream)) {
using (var sw = File.Create (file))
repo_stream!.CopyTo (sw);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've annotated TryGetFile() with [NotNullWhen (true)] elsewhere, why is repo_stream! required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't appear to be, it was likely meant to be temporary. Removed.

};

if (parent is not null)
p.Parent = new Parent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if there should be a Parent(Parent) constructor…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, a better case could be made that Project should not have a public constructor, so I made it internal and changed this to generate xml that can be deserialized.


static class TestDataExtensions
{
public static Project CreateProject (Artifact artifact, Project? parent = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is within an *Extensions class, I'm surprised this isn't an extension method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, it's a test assembly and I didn't want an entire class for it. 😁

@jpobst
Copy link
Contributor Author

jpobst commented Feb 14, 2024

Convention elsewhere is to use src/[Assembly Base Name]/[Dotted Namespace Name]/[Type Name].cs.

Unfortunately this convention seems outdated, as modern tooling does not play nicely with it.

For one, it's kinda annoying that it forces a minimum size of a file explorer in order to know which directory you want to expand:

image

The other issue is that the "new item" templating system builds your namespace based on [Assembly Base Name].[Dotted Namespace Name]

Thus adding a new class Foo.cs to the Java.Interop.Tools.JavaCallableWrappers folder in VS results in:

namespace Java.Interop.Tools.JavaCallableWrappers.Java.Interop.Tools.JavaCallableWrappers
{
  class Foo
  {
  }
}

I also wonder if the "sub-namespaces" are being too granular here; do we really need .Extensions, .Models, .Repositories?

This is largely related. I don't really care about the "sub-namespaces", but I do want them to be in "sub-folders" rather than all sitting in the project root. Then when you do Add New Item VS automatically puts them in the matching "sub-namespace".

I've always just manually fixed them before to not be in "sub-namespaces", but I thought this seems to be the way modern .NET tooling is pushing and maybe I should stop fighting it on new projects. 🤷‍♂️

I can remove the "sub-namespaces" if needed.

@jpobst jpobst force-pushed the java-interop-maven branch from dbf91ae to f938d56 Compare February 15, 2024 20:37
@jpobst jpobst requested a review from jonpryor February 21, 2024 18:30

public virtual Project ResolveRawProject (Artifact artifact)
{
if (poms.TryGetValue (artifact.ToString (), out var project))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels "implicitly weird" that we require that Project.ToString() be identical to Artifact.ToString(). While Artifact.ToString() does have a comment stating "This is a "well-known" format we use, it should not be changed", Project.ToString() doesn't (currently), nor is there a comment stating that these formats should be consistent with each other.

I think it would be clearer and more explicit if we instead had "something" that was identically named between the two types to name the association. Possible ideas:

  • ArtifactString ("rhymes" with ResolvedDependency.ToArtifactString())
  • Group_Artifact_Version (that's what it contains…)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with ArtifactString and VersionedArtifactString throughout the codebase to make this more explicit and consistent.


public interface IPomResolver
{
Project ResolveRawProject (Artifact artifact);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "Raw" serving in this name? Would this be better as just ResolveProject()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from earlier designs, changed simply to Resolve.


namespace Java.Interop.Tools.Maven;

public interface IPomResolver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming? Why not name this IProjectResolver, as it (currently) only resolves Projects? Is it so we can resolve other things in the future while reusing the same abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pom and Project are synonyms, standardized on Project.


static class MavenNetExtensions
{
public static string GetInheritedProperty (this ResolvedDependency dependency, ResolvedProject project, Func<ResolvedDependency, string?> property)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why extension methods and not actual methods, when the type you're providing extension methods for is in the same assembly? Is this so that the method isn't visible via IntelliSense by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to actual methods.


MavenVersion (string rawVersion) => RawVersion = rawVersion;

public static MavenVersion Parse (string version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this mirror Artifact and have a TryParse() method as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…except it can't fail…. Weird semantics, especially around .IsValid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is weird. There doesn't seem to be any validation on Maven version numbers, users can't technically put anything they want. If it isn't a "valid" Maven version, it just gets treated like an opaque string for sorting and version matching purposed.

If you do not follow Maven versioning standards in your project versioning scheme, then for version comparison, Maven interprets the entire version as a simple string.

From https://docs.oracle.com/middleware/1212/core/MAVEN/maven_version.htm#MAVEN8855.


public partial class Project
{
public static Project Parse (Stream stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Stream will also contain XML, why have a ParseXml() method vs naming both (Stream) and (string) overloads Parse()?

Relatedly, there should probably also be a Parse(XmlTextReader) overload, so that both Parse(Stream) and Parse(string) can share most of their logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate naming logic: do what XDocument does, which has:

Thus, I think we should rename Parse(Stream) to Load(Stream), add a Load(XmlReader) overload, and rename ParseXml(string) to Parse(string):

public static Project Load(XmlReader reader)
{
    var serializer = new XmlSerializer (typeof (Project));
    return (Project) serializer.Deserialize (reader);
}

public static Project Load(Stream stream)
{
    return Load (new XmlTextReader (stream) {
        Namespaces = false;
    });
}

public static Project Parse(string xml)
{
    using (var sr = new StringReader (xml))
        return Load (new XmlTextReader (sr) {
            Namespaces = false;
        });
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use Parse and ParseXml to roughly match XmlDocument.Load and XmlDocument.LoadXml for the reason you state: Parse/Load (string) generally refers to a file rather than an XML snippet.

But Load and Parse work as well.


var serializer = new XmlSerializer (typeof (Project));

using (var sr = new StreamReader (stream))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this wrap a Stream in a StreamReader? The XmlTextReader(Stream) constructor exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


void ResolveCore (PropertyStack properties)
{
if (IsSuperPom)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called a "super POM" instead of a "root POM"? Maybe it's just me, but "super" doesn't imply (to me) that it has no parents; "root" does. (/ has no parent directory, C:\ has no parent directory, etc.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…and "super POM" might be "better" in that it's the existing terminology of the space, but it doesn't feel "self documenting", as it were.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_url = baseUrl.TrimEnd ('/');
}

static MavenRepository ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have an explicit static constructor vs. just initializing in the field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jonpryor
Copy link
Member

Addendum/followup to my comment:

this patch is what I'm thinking:

diff --git a/NuGet.Config b/NuGet.Config
index 7d88f585..4e263ee0 100644
--- a/NuGet.Config
+++ b/NuGet.Config
@@ -6,7 +6,6 @@
 -->
 <configuration>
   <packageSources>
-    <clear />
     <add key="dotnet-public" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json" protocolVersion="3" />
     <!-- For Microsoft.DotNet.GenAPI -->
     <add key="dotnet-eng" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-eng/nuget/v3/index.json" protocolVersion="3" />
diff --git a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
index 9c0f50dc..9b296e87 100644
--- a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
+++ b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
@@ -22,6 +22,13 @@
 
   <ItemGroup>
     <PackageReference Include="Microsoft.SourceLink.GitHub" PrivateAssets="All" />
+    <PackageReference
+        Include="XmlSchemaClassGenerator.Console"
+        Version="2.1.1057"
+        ReferenceOutputAssembly="false"
+        GeneratePathProperty="true"
+    />
   </ItemGroup>
 
+  <Import Project="Java.Interop.Tools.Maven.targets" />
 </Project>
diff --git a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets
new file mode 100644
index 00000000..2d6016c2
--- /dev/null
+++ b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets
@@ -0,0 +1,18 @@
+<Project>
+  <Target Name="UpdateProjectSchema"
+        AfterTargets="ResolveAssemblyReferences">
+    <PropertyGroup>
+      <_Xscgen>"$(PkgXmlSchemaClassGenerator_Console)/tools/net462/XmlSchemaClassGenerator.Console.exe"</_Xscgen>
+      <MavenXsd Condition=" '$(MavenXsd)' == '' ">maven-4.0.0.xsd</MavenXsd>
+    </PropertyGroup>
+    <ItemGroup>
+      <_XscgenOpt Include="$(MavenXsd)" />
+      <_XscgenOpt Include="--nullable" />
+      <_XscgenOpt Include="--nc" />
+      <_XscgenOpt Include="--nr" />
+      <_XscgenOpt Include="-o" />
+      <_XscgenOpt Include="&quot;$(MSBuildThisFileDirectory)&quot;" />
+    </ItemGroup>
+    <Exec Command="$(Runtime) $(_Xscgen) @(_XscgenOpt, ' ')" />
+  </Target>
+</Project>

which I then executed with:

% curl -o maven-4.0.0.xsd 'https://maven.apache.org/xsd/maven-4.0.0.xsd'
% dotnet build src/Java.Interop.Tools.Maven/*.csproj -t:UpdateProjectSchema -p:MavenXsd=`pwd`/maven-4.0.0.xsd

There are multiple problems with this approach, which are all solvable:

  1. It re-adds NuGet.org to NuGet.config. We can't actually do this; we'll need to add XmlSchemaClassGenerator.Console to dotnet-public.
  2. I'm not sure that xscgen maven-4.0.0.xsd --nullable --nc --nr was the actually used command-line, because when I did that, as with the above patch, it generated a Pom.cs file, and the names were different.

And by different, Pom.cs has:

namespace Pom
{
    public partial class Model {}
    // …
}

This PR (#1179) does not contain a partial class Model type. Where did it go? Similarly, you have a partial class Project, but my Pom.cs does not have a Project type declared.

I'm not using "xscgen", but I am using the NuGet package that the github repo links to, with the command-line arguments you mention, but the output doesn't match.

Am I using the right XSD? https://maven.apache.org/xsd/maven-4.0.0.xsd?

@jonpryor
Copy link
Member

@jpobst: I'm increasingly wondering how you created Project.cs. This is the closest I've gotten to Project.cs:

% mono $HOME/.nuget/packages/xmlschemaclassgenerator.console/2.1.1057/tools/net462/XmlSchemaClassGenerator.Console.exe \
    maven-4.0.0.xsd \
    --namespace http://maven.apache.org/POM/4.0.0=Java.Interop.Tools.Maven.Models \
    --typeNameSubstitute T:Model=Project \
    --nullable --netCore --nullableReferenceAttributes \
    -o _o

Using --namespace http://maven.apache.org/POM/4.0.0=Java.Interop.Tools.Maven.Models allows me to have namespace Java.Interop.Tools.Maven.Models.

Using --typeNameSubstitute T:Model=Project allows me to have partial class Project instead of partial class Model.

However, there's still lots of differences:

% git diff --no-index -w Models/Project.cs _o/Java.Interop.Tools.Maven.Models.cs
@@ -26,10 +25,10 @@ namespace Java.Interop.Tools.Maven.Models
     [System.ComponentModel.DescriptionAttribute(@"3.0.0+ The &lt;code&gt;&amp;lt;project&amp;gt;&lt;/code&gt; element is the root of the descriptor. The following table lists all of the possible child elements. 3.0.0+ The &lt;code&gt;&amp;lt;project&amp;gt;&lt;/code&gt; element is the root of the descriptor. The following table lists all of the possible child elements.")]
     [System.CodeDom.Compiler.GeneratedCodeAttribute("XmlSchemaClassGenerator", "2.1.1057.0")]
     [System.SerializableAttribute()]
-    [System.Xml.Serialization.XmlTypeAttribute("Project")]
+    [System.Xml.Serialization.XmlTypeAttribute("Model", Namespace="http://maven.apache.org/POM/4.0.0")]
     [System.Diagnostics.DebuggerStepThroughAttribute()]
     [System.ComponentModel.DesignerCategoryAttribute("code")]
-    [System.Xml.Serialization.XmlRootAttribute("project")]
+    [System.Xml.Serialization.XmlRootAttribute("project", Namespace="http://maven.apache.org/POM/4.0.0")]
     public partial class Project
     {

Even though we renamed Model to Project, we still have XmlTypeAttribute("Model"). This feels like a bug in xcsgen. Additionally, we have lots of Namespace="…" properties on the XmlRootAttribute custom attribute; this happens throughout, and I won't be repeating them.

@@ -225,9 +224,9 @@ namespace Java.Interop.Tools.Maven.Models
         }
         
         /// <summary>
-        /// <para xml:lang="en">Initializes a new instance of the <see cref="Model" /> class.</para>
+        /// <para xml:lang="en">Initializes a new instance of the <see cref="Project" /> class.</para>
         /// </summary>
-        internal Project()
+        public Project()
         {
             this._licenses = new System.Collections.ObjectModel.Collection<License>();
             this._developers = new System.Collections.ObjectModel.Collection<Developer>();

Your comment mentions Model, even though it's on Project. Your constructor is internal when the generated one is public. This feels like the generated file has been manually edited after generation.

@@ -2593,17 +2592,17 @@ namespace Java.Interop.Tools.Maven.Models
     
     [System.CodeDom.Compiler.GeneratedCodeAttribute("XmlSchemaClassGenerator", "2.1.1057.0")]
     [System.SerializableAttribute()]
-    [System.Xml.Serialization.XmlTypeAttribute("ModelProperties", AnonymousType=true)]
+    [System.Xml.Serialization.XmlTypeAttribute("ModelProperties", Namespace="http://maven.apache.org/POM/4.0.0", AnonymousTyp
e=true)]
     [System.Diagnostics.DebuggerStepThroughAttribute()]
     [System.ComponentModel.DesignerCategoryAttribute("code")]
     public partial class ModelProperties
     {
         
         [System.Xml.Serialization.XmlIgnoreAttribute()]
-        private System.Collections.ObjectModel.Collection<System.Xml.Linq.XElement> _any;
+        private System.Collections.ObjectModel.Collection<System.Xml.XmlElement> _any;
         
         [System.Xml.Serialization.XmlAnyElementAttribute()]
-        public System.Collections.ObjectModel.Collection<System.Xml.Linq.XElement> Any
+        public System.Collections.ObjectModel.Collection<System.Xml.XmlElement> Any
         {
             get
             {

More additions of XmlTypeAttribute.Namespace (which I said I'd ignore…), but more oddly is the collection type: you have Collection<XElement>, I got Collection<XmlElement>.

The only way I could find to get xcsgen to emit XElement is to use --pcl, which does allow it to use XElement, but also introduces tons of other changes, removing lots System.ComponentModel.DescriptionAttribute. Additionally, your Project.cs contains both Collection<System.Xml.XmlElement> (e.g. DeveloperProperties._any) and Collection<System.Xml.Linq.XElement> (e.g. ModelProperties._any). Again, this feels like the generated file was manually edited.

If I checkout https://github.com/mganss/XmlSchemaClassGenerator and build xcsgen.proj and run dotnet path/to/xcsgen.dll instead of using mono XmlSchemaClassGenerator.Console.exe, I see the same differences.

@jonpryor
Copy link
Member

@jpobst: I'm also increasingly thinking that my above comments about the differences between my generated Project.cs and the Project.cs within this PR are barking up the wrong tree:

This is processing maven-4.0.0.xsd. There is also a maven-3.0.0.xsd as well as a maven-4.1.0-alpha-10.xsd, implying that 4.1 will eventually come out, and maybe 5.0, or 6.0, or…

Why should the generated types be public? There is a XmlSchemaClassGenerator.Console.exe --assemblyVisible option which makes the generated types internal. Using internal visibility would allow us to:

  1. Put the generated types into a versioned sub-namespace, e.g. Java.Interop.Tools.Maven.Models.v4_0, all internal
  2. Have a "version agnostic" Project type which we full control, which wraps an appropriate version-specific type. Project.Load(Stream) could support v4.0 now, and v5.0 in the future, and because we fully control the Project API, we can ensure this doesn't break anybody.

However, this idea mirrors that of "full bindings for .jar files" vs. "slim bindings": making xcsgen output public means a lower barrier to entry (less work to write a Project wrapper), and means that other assemblies using Java.Interop.Tools.Maven have full access to everything in the XML. The downside is we lose easy control for future versioning -- what happens if v5 is significantly different from v4? -- and "expose everything in the XML" might make for an unwieldy API.

I'm currently inclined to "make xcsgen output internal" and put into a versioned namespace. Even if we want xcsgen output to be public, I still think it should be in a versioned sub-namespace, and we could do:

namespace Java.Interop.Tools.Maven.Models {
    // All hand-written
    public partial class Project {};
}
namespace Java.Interop.Tools.Maven.Models.v4_0 {
    // generated output, using the default names, e.g. `Model` instead of `Project`
    public partial class Model {}

    // hand-written partial class declaration
    public partial class Model : Project {}
}

This way, generated "stuff" is off in a separate (versioned) namespace, and we provide a common abstraction that we control in the form of Project.

@jpobst jpobst force-pushed the java-interop-maven branch from 6896260 to 77087b7 Compare February 22, 2024 19:29
@jpobst
Copy link
Contributor Author

jpobst commented Feb 22, 2024

This feels like the generated file has been manually edited after generation.

Correct, the generated output was manually edited to suit our purposes (and to initially be compatible with https://github.com/Redth/MavenNet/blob/master/MavenNet/Models/PomMavenXsdModels.cs).

I think we're overthinking this. In general, we consider JI tool assemblies to be internal API. (For example, we changed the public API of JavaCallableWrappers in #1174.) So I don't think we should be very concerned with versioning the schema class or potential future API breakages.

We likely won't update the schema class unless a new version includes new dependency resolution features that we want to support.

@jonpryor
Copy link
Member

@jpobst wrote:

I think we're overthinking this.

Indeed!

That said, even ignoring the API compatibility concerns, I generally prefer that generated stuff remain generatable.

To that end, I've added XmlSchemaClassGenerator.Console to the dotnet-public feed, which -- after a Version update -- should allow my previous patch to work.

That leaves the question of XElement vs. XmlElement, and if we just use --pcl we'll consistently get XElement. We'll lose some custom attributes, but I don't think those are relevant for our purposes:

diff --git a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
index 9c0f50dc..2c9774a6 100644
--- a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
+++ b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.csproj
@@ -22,6 +22,13 @@
 
   <ItemGroup>
     <PackageReference Include="Microsoft.SourceLink.GitHub" PrivateAssets="All" />
+    <PackageReference
+        Include="XmlSchemaClassGenerator.Console"
+        Version="2.1.1107"
+        ReferenceOutputAssembly="false"
+        GeneratePathProperty="true"
+    />
   </ItemGroup>
 
+  <Import Project="Java.Interop.Tools.Maven.targets" />
 </Project>
diff --git a/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets
new file mode 100644
index 00000000..f57c545b
--- /dev/null
+++ b/src/Java.Interop.Tools.Maven/Java.Interop.Tools.Maven.targets
@@ -0,0 +1,20 @@
+<Project>
+  <Target Name="UpdateProjectSchema"
+        AfterTargets="ResolveAssemblyReferences">
+    <PropertyGroup>
+      <_Xscgen>"$(PkgXmlSchemaClassGenerator_Console)/tools/net462/XmlSchemaClassGenerator.Console.exe"</_Xscgen>
+      <MavenXsd Condition=" '$(MavenXsd)' == '' ">maven-4.0.0.xsd</MavenXsd>
+    </PropertyGroup>
+    <ItemGroup>
+      <_XscgenOpt Include="$(MavenXsd)" />
+      <_XscgenOpt Include="--namespace http://maven.apache.org/POM/4.0.0=Java.Interop.Tools.Maven.Models" />
+      <_XscgenOpt Include="--typeNameSubstitute T:Model=Project" />
+      <_XscgenOpt Include="--nullable" />
+      <_XscgenOpt Include="--pcl" />
+      <_XscgenOpt Include="--netCore" />
+      <_XscgenOpt Include="--nullableReferenceAttributes" />
+      <_XscgenOpt Include="-o &quot;Models&quot;" />
+    </ItemGroup>
+    <Exec Command="$(Runtime) $(_Xscgen) @(_XscgenOpt, ' ')" />
+  </Target>
+</Project>

This results in a rather sizeable 3348 line diff between Models/Project.cs and Models/Java.Interop.Tools.Maven.cs (not sure how to control the output filename), most of which looks irrelevant:

@@ -23,13 +22,9 @@ namespace Java.Interop.Tools.Maven.Models
     /// <para>The &lt;code&gt;&amp;lt;project&amp;gt;&lt;/code&gt; element is the root of the descriptor.
     ///        The following table lists all of the possible child elements.</para>
     /// </summary>
-    [System.ComponentModel.DescriptionAttribute(@"3.0.0+ The &lt;code&gt;&amp;lt;project&amp;gt;&lt;/code&gt; element is the 
root of the descriptor. The following table lists all of the possible child elements. 3.0.0+ The &lt;code&gt;&amp;lt;project&a
mp;gt;&lt;/code&gt; element is the root of the descriptor. The following table lists all of the possible child elements.")]
-    [System.CodeDom.Compiler.GeneratedCodeAttribute("XmlSchemaClassGenerator", "2.1.1057.0")]
-    [System.SerializableAttribute()]
-    [System.Xml.Serialization.XmlTypeAttribute("Project")]
-    [System.Diagnostics.DebuggerStepThroughAttribute()]
-    [System.ComponentModel.DesignerCategoryAttribute("code")]
-    [System.Xml.Serialization.XmlRootAttribute("project")]
+    [System.CodeDom.Compiler.GeneratedCodeAttribute("XmlSchemaClassGenerator", "2.1.1107.0")]
+    [System.Xml.Serialization.XmlTypeAttribute("Model", Namespace="http://maven.apache.org/POM/4.0.0")]
+    [System.Xml.Serialization.XmlRootAttribute("project", Namespace="http://maven.apache.org/POM/4.0.0")]
     public partial class Project
     {
         
@@ -37,7 +32,6 @@ namespace Java.Interop.Tools.Maven.Models
         /// <para>4.0.0+</para>
         /// <para>Declares to which version of project descriptor this POM conforms.</para>
         /// </summary>
-        [System.ComponentModel.DescriptionAttribute("4.0.0+ Declares to which version of project descriptor this POM conforms.")]
         [System.Diagnostics.CodeAnalysis.AllowNullAttribute()]
         [System.Diagnostics.CodeAnalysis.MaybeNullAttribute()]
         [System.Xml.Serialization.XmlElementAttribute("modelVersion")]
… *lots* of DescriptionAttribute removals…
@@ -225,9 +201,9 @@ namespace Java.Interop.Tools.Maven.Models
         }
         
         /// <summary>
-        /// <para xml:lang="en">Initializes a new instance of the <see cref="Model" /> class.</para>
+        /// <para xml:lang="en">Initializes a new instance of the <see cref="Project" /> class.</para>
         /// </summary>
-        internal Project()
+        public Project()
         {
             this._licenses = new System.Collections.ObjectModel.Collection<License>();
             this._developers = new System.Collections.ObjectModel.Collection<Developer>();
…

@jonpryor jonpryor merged commit 1c9c8c9 into main Mar 11, 2024
1 of 4 checks passed
@jonpryor jonpryor deleted the java-interop-maven branch March 11, 2024 17:36
jonpryor pushed a commit to dotnet/android that referenced this pull request Mar 13, 2024
Change: dotnet/java-interop@3436a30...5bca8ad

  * dotnet/java-interop@5bca8ad6: [build] Automatically add NullableAttributes.cs for netstandard2.0 (dotnet/java-interop#1188)
  * dotnet/java-interop@45437e22: [Java.Interop] suppress IL3050 with `#pragma` (dotnet/java-interop#1201)
  * dotnet/java-interop@1c9c8c9c: [Java.Interop.Tools.Maven] Initial commit. (dotnet/java-interop#1179)

With dotnet/java-interop@5bca8ad6, all instances of nullable
attributes types defined in `NullableAttributes.cs` are now `internal`.

However, `Xamarin.Android.Build.Tasks` was consuming the `public`
types from its reference to `Java.Interop.Tools.JavaCallableWrappers`,
so this change resulted in a build break:

	MavenExtensions.cs(26,32): error CS0122: 'NotNullWhenAttribute' is inaccessible due to its protection level
	MamJsonParser.cs(92,43): error CS0122: 'NotNullWhenAttribute' is inaccessible due to its protection level
	MamJsonParser.cs(92,81): error CS0122: 'NotNullWhenAttribute' is inaccessible due to its protection level

We can apply almost the same logic from dotnet/java-interop@5bca8ad6
to `xamarin-android`; the difference is that neither of the
`netstandard2.0` projects in xamarin-android that need the attributes
have `$(Nullable)='enabled'` and thus our `Condition` fails.  

(`Xamarin.Android.Build.Tasks.csproj` and
`Xamarin.Android.Tools.JavadocImporter.csproj` include `.cs` files from
`external/Java.Interop` that have been NRT annotated and thus need the
nullable attribute types.)

We add `$(Nullable)='annotations'` to the `@(Compile)` which
allows one to use NRT syntax but does not emit any NRT warnings since
these assemblies have not been converted.  We then modify the
`NullableAttributes.cs` inclusion logic to additionally key off the
`$(Nullable)=annotations` value.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Pobst <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants