Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UnionEdgeGraph, Either, ConcatenatedCollection and MutableGraph.addEdges #84

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

saeta
Copy link
Owner

@saeta saeta commented Jun 16, 2020

This PR adds functionality to make it easy to combine the edges of 2 graphs, and the supporting functionality to implement this capability.

There are 2 ways to combine the edges of 2 graphs:

  1. Modify self: in this mechanism, we modify self to copy the edge information into a MutableGraph.
  2. Make a new graph: in this mechanism, we return a new graph that represents the union of edge sets of the two.

In order to implement the latter, we need to be able to construct a collection that is the concatenation of two underlying collections. This is a general pattern, and thus this is extracted and made generic in the ConcatenatedCollection type.

A previous iteration of the ConcatenatedCollection used a specialized Index type as a nested enum of the ConcatenatedCollection. Unfortunately, this makes it impossible to use ConcatenatedCollection for UnionEdgesGraph, as the EdgeId type must work for both VertexEdgeCollection and VertexInEdgeCollection. As a result, we pull that out, and use Either, which solves the problem!

saeta added 5 commits June 15, 2020 13:01
…ining either to avoid reimplementing ConcatenatedCollection a bunch of times.
…graphs together without making a copy.

Also along with this change includes:
 1. A collection which concatenates two heterogeneous collections, so long as their element types are identical.
 2. DefaultInitializable conformances for a variety of fixed-width numerical types.
 3. An implementation of `Either` which is used to implement `UnionEdgeGraph` as well as `ConcatenatedCollection`.
 4. Tests for the above.

Note: The `UnionEdgeGraph` demonstrates why an `Either` type is necessary (assuming
we want to avoid duplicating the implementation of ConcatenatedCollection across
both the `VertexEdgeCollection` and `VertexInEdgeCollection` types).
@saeta saeta requested a review from dabrahams June 16, 2020 00:26
Copy link
Collaborator

@dabrahams dabrahams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair warning: I tweaked my back yesterday and it has made me exceedingly crabby. I've tried not to let that affect my review, but if it did, please don't take it personal-like. I'm just in a bit of pain and grumpy about everything.

@@ -165,3 +165,54 @@ extension MutablePropertyGraph where Self: DefaultInitializable {
edgeProperties: InternalEdgePropertyMap(for: other))
}
}

extension IncidenceGraph where Self: MutableGraph & VertexListGraph {
/// Adds all edges from `other` into `self`, calling `edgeCreationListener` with every new EdgeId.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 cols?

This is a bit of a strange method; the vertices in a different graph don't necessarily have any correspondence to the ones in this graph with the same IDs. Maybe a method that takes a collection of source/target pairs, and another one that exports an appropriate collection? Then you can know based on context whether two graphs happen to have that relationship? Ditto next method.

Also, been wondering about destination. Seems pretty long compared to target, and what do you call the other end?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 cols?

I thought we used 100 cols?

This is a bit of a strange method; the vertices in a different graph don't necessarily have any correspondence to the ones in this graph with the same IDs. Maybe a method that takes a collection of source/target pairs, and another one that exports an appropriate collection? Then you can know based on context whether two graphs happen to have that relationship? Ditto next method.

Yeah, you raise a question I was ignoring when I wrote this that I'd like to continue discussing even though I'm going to document this funny behavior and press onward. In my mind if we're defining a mapping, the most general way of expressing that is probably a closure. I've refactored to do this instead, and added some extra sugar on top to make the default case where VertexIds correspond work similarly. WDYT? (Note: see pending push to include the change.)

Also, been wondering about destination. Seems pretty long compared to target, and what do you call the other end?

Yeah, we should definitely consider renaming things. Potential candidate names include origin, destination, source, target, head, tail. I propose we defer this naming discussion to a separate issue. (#87)

/// A graph containing all the vertices and edges of a base graph, augmented with all the edges of
/// a second graph data structure.
///
/// A `UnionEdgesGraph` allows overlaying the edges of one graph onto the edges of another graph.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about duplicate edges? "Union" typically means duplicates are eliminated. In any case, you need to explain what happens here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent point. It's definitely not de-duplicating edges. I'm not convinced that de-duplicating edges is the right thing either (given that parallel edges may have different weights or other properties attached). So, I the issue is likely with the word union and not the behavior. As alluded to above, I'm pulling this part of the PR, as I'm not sure it makes sense at this time.

where Base.VertexId == ExtraEdges.VertexId {
/// The name of a vertex in `self`.
public typealias VertexId = Base.VertexId
/// The name of a vertex in `self`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The name of a vertex in `self`.
/// The name of an edge in `self`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it, I probably suggested “name,” but having second thoughts, since that strongly suggests “String”. I'd go with identity (today).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack; pointed to from #87 for a proper discussion.

/// This operation can be useful when viewing the same set of vertices in multiple ways. The
/// `UnionEdgesGraph` does not modify or even copy either of the two underlying graphs, and all
/// operations on the `UnionEdgesGraph` occur with identical complexity to the underlying graphs'
/// operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the base graph “primary” in some way, or are they peers? The naming suggests otherwise, but it's not really clear how that plays out.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base is primary in a certain sense. If ExtraEdges contains extra vertices, they are not included in a vertex listing.

///
/// A `UnionEdgesGraph` allows overlaying the edges of one graph onto the edges of another graph.
/// This operation can be useful when viewing the same set of vertices in multiple ways. The
/// `UnionEdgesGraph` does not modify or even copy either of the two underlying graphs, and all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about copying in this way is a bit fraught. Surely it logically copies both graphs if they are values. You might say instead that it doesn't copy the storage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this was not correct. (Removed)

_ = src.addVertex()
_ = src.addVertex()

_ = src.addEdge(from: 1, to: 2, storing: "1->2 (src)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's poor form to write these tests assuming you know how vertex IDs are going to be allocated or even that they're going to be Ints. That's not what you're trying to test here, and if the implementation changes, the test will break for reasons unrelated to what it's trying to test.

That's not to mention the fact that this isn't how people are going to use the library. You have an opportunity here to see what it's like to write real code with your API, but it's somewhat squandered.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack; I've switched to using variables instead of assuming contiguous, zero-indexed integers. That said, I'm not sure it makes the code cleaner (but I'm also not sure there's a good way to get around this... perhaps the right approach is to let users assume the vertex id's are zero-indexed; this would inhibit the sort of flexibility / generalizability to other spine implementations, so I'm not sure it's the right way to go).

_ = g2.addVertex()
_ = g2.addEdge(from: 1, to: 2, storing: "1->2 (g2)")

var g = g1.unionEdges(with: g2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be covering so few cases that it could really be broken. Can you not think of more things to check?

XCTAssertEqual(Array(0..<3), Array(g.vertices))
XCTAssertEqual(Array(10..<13), g.vertices.map { g[vertex: $0] })

var recorder = TablePredecessorRecorder(for: g)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this part testing unionEdges? Just exercising code is OK, I guess, but it's really not clear what you're trying to discover by doing a BFS. What you should do is build some generalized semantic tests for the protocols you claim to be modeling, like StdlibProtocolSemanticsTests.swift does, and use those, and test any APIs on the type (such as construction) that are not related to a protocol, and separately run BFS through some tests on a broad enough range of graph structures that you're confident it's handling genericity correctly. I don't think using BFS on this particular data type (especially when it's only got 2 edges) is adding much. WDYT?


class EitherTests: XCTestCase {

func testEquality() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should test reflexivity, symmetry, and transitivity. Then you should factor it out into something general for Equatable in https://github.com/saeta/penguin/blob/master/Tests/PenguinStructuresTests/StdlibProtocolTests.swift. Then the Equatable test gets used on Collection indices in the existing tests. etc.

func testComparable() {
typealias E = Either<Int, Int>
XCTAssert(E.a(1) < .b(0))
XCTAssert(E.b(0) > .a(100000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are required semantic relationships between the Comparable and Equatable operations. Also see note on testEquality.

Copy link
Owner Author

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_i_th wave of responses; stand by for more...

@@ -165,3 +165,54 @@ extension MutablePropertyGraph where Self: DefaultInitializable {
edgeProperties: InternalEdgePropertyMap(for: other))
}
}

extension IncidenceGraph where Self: MutableGraph & VertexListGraph {
/// Adds all edges from `other` into `self`, calling `edgeCreationListener` with every new EdgeId.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 cols?

I thought we used 100 cols?

This is a bit of a strange method; the vertices in a different graph don't necessarily have any correspondence to the ones in this graph with the same IDs. Maybe a method that takes a collection of source/target pairs, and another one that exports an appropriate collection? Then you can know based on context whether two graphs happen to have that relationship? Ditto next method.

Yeah, you raise a question I was ignoring when I wrote this that I'd like to continue discussing even though I'm going to document this funny behavior and press onward. In my mind if we're defining a mapping, the most general way of expressing that is probably a closure. I've refactored to do this instead, and added some extra sugar on top to make the default case where VertexIds correspond work similarly. WDYT? (Note: see pending push to include the change.)

Also, been wondering about destination. Seems pretty long compared to target, and what do you call the other end?

Yeah, we should definitely consider renaming things. Potential candidate names include origin, destination, source, target, head, tail. I propose we defer this naming discussion to a separate issue. (#87)

@@ -415,3 +415,209 @@ extension BidirectionalGraph {
TransposeGraph(self)
}
}

/// A graph containing all the vertices and edges of a base graph, augmented with all the edges of
/// a second graph data structure.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right that there's not an obvious right answer here. I'm going to remove this GraphTransformation for now, and focus on the other issues in the rest of the PR.

/// A graph containing all the vertices and edges of a base graph, augmented with all the edges of
/// a second graph data structure.
///
/// A `UnionEdgesGraph` allows overlaying the edges of one graph onto the edges of another graph.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent point. It's definitely not de-duplicating edges. I'm not convinced that de-duplicating edges is the right thing either (given that parallel edges may have different weights or other properties attached). So, I the issue is likely with the word union and not the behavior. As alluded to above, I'm pulling this part of the PR, as I'm not sure it makes sense at this time.

///
/// A `UnionEdgesGraph` allows overlaying the edges of one graph onto the edges of another graph.
/// This operation can be useful when viewing the same set of vertices in multiple ways. The
/// `UnionEdgesGraph` does not modify or even copy either of the two underlying graphs, and all
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this was not correct. (Removed)

/// This operation can be useful when viewing the same set of vertices in multiple ways. The
/// `UnionEdgesGraph` does not modify or even copy either of the two underlying graphs, and all
/// operations on the `UnionEdgesGraph` occur with identical complexity to the underlying graphs'
/// operations.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base is primary in a certain sense. If ExtraEdges contains extra vertices, they are not included in a vertex listing.

}
}

/// Adapts a property map for a graph to be used with the graph unioned with extra edges.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not quite sure I understand your question...

/// The identifier used to access data.
public typealias Key = Graph.VertexId
/// The value of data stored in `self`.
public typealias Value = Underlying.Value
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should have been consistent with regards to naming. Sorry!

/// The underlying property map.
private var underlying: Underlying

/// Wraps `underlying` for use with a transposed version of its graph.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📠 🍝

}

/// Wraps `underlying` for use with `graph`. (`graph` is taken to help type inference along.)
public init(_ underlying: Underlying, for graph: __shared Graph) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

/// Accesses the `Value` for a given `Key`.
public subscript(key: Key) -> Value {
get { underlying[key] }
set { underlying[key] = newValue }
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should be.

Copy link
Owner Author

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dabrahams for the great review! I've written now a checkComparableSemantics test, and used it to test both Either as well as the Index type for all Collections. Since the relevant functionality has been reverted for this one, I've filed a follow-up issue to add a generic graph-property-checker.

if first.startIndex != first.endIndex { return .a(first.startIndex) }
return .b(second.startIndex)
}
/// One beyond the last valid index into `self`.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://developer.apple.com/documentation/swift/collection/2944204-endindex :

the position one greater than the last valid subscript argument.

Could you clarify what distinction you're trying to make?

(In the meantime, I've just copied the summary from the Collection documentation.)

}
/// One beyond the last valid index into `self`.
public var endIndex: Index { .b(second.endIndex) }
/// Returns the next valid index after `index`.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is: in a sparse collection there could be a number of invalid indices. (i.e. just because a user has an instance of type Index does not mean it can be used to subscript self.) Although this property is inherent in the Swift collections API, and there's nothing unique about my use here.


/// A collection that is all the elements of one collection followed by all the elements of a second
/// collection.
public struct ConcatenatedCollection<First: Collection, Second: Collection>: Collection where
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack; thanks!

// See the License for the specific language governing permissions and
// limitations under the License.

/// Represents one of two possible cases.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've struggled now with this for a while. I think I have something a bit better now, but I'm definitely not fully happy with it. I'd be interested in discussing further.

}

extension Either: Comparable where A: Comparable, B: Comparable {
/// True iff `lhs` is less than `rhs`.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; done!

_ = src.addVertex()
_ = src.addVertex()

_ = src.addEdge(from: 1, to: 2, storing: "1->2 (src)")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack; I've switched to using variables instead of assuming contiguous, zero-indexed integers. That said, I'm not sure it makes the code cleaner (but I'm also not sure there's a good way to get around this... perhaps the right approach is to let users assume the vertex id's are zero-indexed; this would inhibit the sort of flexibility / generalizability to other spine implementations, so I'm not sure it's the right way to go).

@saeta saeta requested a review from dabrahams July 4, 2020 23:13
@texasmichelle texasmichelle changed the base branch from master to main December 8, 2020 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants