I came across the following SO question last night - How to intersect two sorted integer arrays without duplicates? This seemed like a good opportunity to do a few things:
- See if I could understand & implement a solution for this by-hand.
- See if I could come up with other solutions using language features.
- Review and learn some more C# (been quite a few months now).
- Write a new blog entry.
First, we’ll probably want to review the operation of intersection. Intersection is one of the basic set operations and is defined on sets A and B such that the set C generated contains only elements present in both A and B.
One way to visualize this is a Venn diagram. Only the elements in the overlapping middle portion of the diagram would be present in the new set after A and B are intersected.

Another way to conceptualize this is an SQL Inner Join, where the rows from a table A are matched up with the rows in table B on a condition. This is elaborated on by Jeff Atwood here.
In this particular version of the problem, we are looking at two arrays being intersected as opposed to two sets being intersected. This leaves us with some requirements we must account for when dealing with this:
- We must perform the intersection such that there are no repeating elements.
- We must maintain sorted order of the elements (sets are unordered by definition).
- The size of each array could be different.
We do have one assumption for this that will help us - The arrays will be sorted in ascending order. With that in mind, how do we actually go about solving this problem? I came up with three techniques:
- A manual technique where we decide if we want to merge the current element at each step.
- Using an existing set implementation to perform the intersection, followed by adding the elements into an array in sorted order.
- Using an existing set implementation that does all of this for us! I believe the implementation is probably some form of a balanced tree.
The manual algorithm looks something like this:
- Go through element-by-element in both A and B.
- Decide if we need the element.
- True iff the current elements in A and B are equal AND the element is not in the new collection C yet.
- If we do need the element:
- Add it to C.
- Look at the next elements in both A and B.
- If we do not need the element
- Look for the next smallest element in either A or B to add.
- Repeat this until the end of either A or B is found.
It’s important to note that this is a pretty specific algorithm based on the assumptions. It would not work as a generalized intersection of any values.
public static List<int> IntersectsManual(int[] a, int[] b)
{
List<int> c = new List<int>();
int aPosition = 0;
int bPosition = 0;
while (a.Length != aPosition && b.Length != bPosition)
{
if (a[aPosition] == b[bPosition])
{
if (c.Count == 0 || c[c.Count - 1] != a[aPosition])
c.Add(a[aPosition]);
aPosition++;
bPosition++;
}
else if (a[aPosition] < b[bPosition])
{
aPosition++;
}
else
{
bPosition++;
}
}
return c;
}
One thing that was neat to notice was that it’s VERY similar to the algorthm for the Union set operation (under the same assumptions). The main difference is that the algorithm tends to be more greedy and pick up values as often as it can, as opposed to as little as it can. This means that:
- If the element is not a duplicate, we want to add the new element regardless of its size.
- After we have reached the end of one array, we must exhaust the other array and add all remaining elements to C.
public static List UnionManual(int[] a, int[] b)
{
List c = new List();
int aPosition = 0;
int bPosition = 0;
while (a.Length != aPosition && b.Length != bPosition)
{
int valueAtA = a[aPosition];
int valueAtB = b[bPosition];
if (a[aPosition] == b[bPosition])
{
if (c.Count == 0 || c[c.Count - 1] != a[aPosition])
c.Add(a[aPosition]);
aPosition++;
bPosition++;
}
else if (a[aPosition] < b[bPosition])
{
if (c.Count == 0 || c[c.Count - 1] != a[aPosition])
c.Add(a[aPosition]);
aPosition++;
}
else
{
if (c.Count == 0 || c[c.Count - 1] != b[bPosition])
c.Add(b[bPosition]);
bPosition++;
}
}
if (aPosition != a.Length)
exhaustArray(c, a, aPosition);
else
exhaustArray(c, b, bPosition);
return c;
}
Note that I have not yet created a complete a set of unit tests for these methods. Use them with caution.
Granted, while these were interesting and fun exercises, this could be error-prone and time consuming to come up with. Much better alternatives are to use the language and its standard library to help you out! Never forget the anthems of OOP - "Reuse! Encapsulation! Abstraction! Elegance!" and so on.
In the case of C#, we can make use of the HashSet
- Create a HashSet with the elements of A contained within it.
- Use the IntersectWith(IEnumerable) method to perform the intersection.
- Create a new List
based on the HashSet, and use Linq to order the set by the natural order of the elements.
public static List<int> IntersectsHashSet(int[] a, int[] b)
{
HashSet<int> c = new HashSet<int>(a);
c.IntersectWith(b);
List<int> d = new List<int>(c).OrderBy(x => x).ToList();
return d;
}
Likewise, a Union can be performed in the same manner (except, well... use UnionWith() instead of IntersectWith()). This particular implementation uses a SortedSet to skip the sorting operation at the end:
public static List<int> UnionSortedSet(int[] a, int[] b)
{
SortedSet<int> c = new SortedSet<int>(a);
c.UnionWith(b);
List<int> d = new List<int>(c);
return d;
}
All and all, these two basic Set operations are fairly easy to implement, standard libraries or not!
To conclude this short bit on graphs (see Part 1, and Part 2) lets look at a sample implementation I've created in Python. Note that this targets the 2.x version of the language, but I believe it would run OK in Python 3 (except for print statements vs function maybe).
This particular implementation has the following properties:
- It's directed.
- It's unweighted (no edge can specify a weight).
- It supports cyclic nodes.
To view the full source yourself, check it out here on BitBucket.
First, lets describe the nodes themselves a bit, in GraphNode.py. Nodes in this graph have the following functionality:
- Can add neighbors (nodes that it can reach).
- Can check if a node is reachable (adjacent).
- Can be iterated on (for searching - Use typical Python iterator syntax).
- Can be printed (and show the values it's connected to).
This is effectively a container for a list of GraphNodes. Check the BitBucket site to see how it's written.
Next, and more interestingly, is the implementation of Graph.py. Features the Graph object supports include:
- Adding a new value.
- Marking two values adjacent.
- Checking if two values are adjacent.
- Determining a path between the two nodes.
- Finding all the paths between a value and the nodes it can reach.
Adding a new value is as simple as appending to a collection of nodes:
def add(self, value):
node = GraphNode(value)
self.__nodes__[value] = node
Marking and checking nodes for adjacency follows this simple pattern: - Get both nodes - If both nodes exist, optionally perform an operation - Return if both nodes existed
For example:
def mark_adjacent(self, x, y):
x_node = self.__nodes__[x]
y_node = self.__nodes__[y]
if x_node and y_node:
x_node.add_neighbor(y_node)
return x_node and y_node
Finally, finding a path between two values x and y involves a fairly straightforward function. Prior to running the algorithm, the function verified both values exist within the graph. If they do, the following recursive algorithm is applied:
- If the current node being viewed has already been visited, the path cannot continue from here (cyclic node).
- Otherwise, mark the node as visited.
- Appending the current node to the path.
- If the current node is the end node, return true.
- Otherwise, recursively search all adjacent nodes to the current node until a path is found
- If the ending node is not found on the recursive step, remove the topmost node from the path and return false.
The algorithm in code is listed below. Note that this is not the method that should be called, but rather the Graph.path(x, y) function instead. Graph.path(x, y) will call this function if it is required to.
def __path_helper__(self, start, end, cur, path):
if cur.visited:
return False
cur.visited = True
path.append(cur)
if cur != end:
for neighbor in cur:
if self.__path_helper__(start, end, neighbor, path):
return True
if path:
path.pop()
return False
else:
return True
There's lots left to be desired in this code, but this is a simple example of what a graph may look like in code. There are tons of variants on the implementation of graphs, but all of the fundamental operations remain the same throughout implementations.
In the first part of this series on graphs, we looked at what graphs are, and a few properties of graphs. I wanted to provide a simple example of how a graph may be used.
The first problem deals with the combinations of increasing values between a range of two bounds. For example, given a range of 3000-4000, the valid set of increasing values looks like this:
range = (3000 - 4000)
valid_values = {3456, 3457, 3458, ..., 3789}
The question is exactly what the carnality of the set of valid values is.
Several properties are important to realize in this problem: 1. Once a prior digit has been selected, all the proceeding digits must be larger than it (ex: once a 4 is used, a number >= 5 must be selected, once a 5 is used, a number >= 6 must be used, etc). 2. The first digit in the range cannot be repeated (in the case of 3xyz, only numbers between 4 and 9 can be used again) 3. The maximum length of any given path is 4 (the amount of possible digits).
Realizing this, we can construct a graph containing the values [4, 9] whose edges connect to every node with a larger value than it. In other words, there are 5 adjacent nodes to the 4 node, 4 adjacent nodes to the 5 node, 3 adjacent nodes to the 6 node, etc. The graph would look like this:

Using this image, it's a matter of finding all the possible paths of length 4 that exist in this graph. I'll leave you to do the manual work and determine the answer :)
In the third part, we will look at a sample graph implementation done in Python.
A Graph data structure is a very generalized organization of data used to define relationships among different pieces of information. These "relationships" can represent many things, such as permutations in data, state machines, physical locations, and probably much more I'm not aware of.
A graph consists of two main pieces: edges and nodes:
- Nodes represent a piece of information contained within the graph.
- Edges represent relationships between two different nodes.
Nodes contain a set of edges that lead to subsequent nodes containing additional information. This, as I like to think of it, is a relationship among two pieces of data in a given graph.
Edges also have several other (potentially important) properties:
Directed vs Undirected Edges can describe a possible direction in which a node-to-node traversal can occur.
An undirected edge from A to B means traversal can occur from A->B, and B->A.
A directed edge from A to B means traversal can only occur from A->B or B->A.
For example, a dungeon game will allow the player to travel back-and-forth between rooms in the dungeon freely, while restricting some doors to work in a single direction.Edge weight describes a "cost" (usually a numeric type) to traverse the particular edge. Edge weight is useful when determining paths throughout a graph with the best efficiency (sometimes, the most direct and obvious is NOT the best choice!)
For example, if the graph represented a map of several neighboring towns, weight could describe the distance between two cities on a particular road.
Graphs in their most general form do not define where edges can be placed, as long as the edge connects two different nodes. This can lead to a few "funny" situations depicted below:
- Circular graphs are graphs where a path exists from a node A back to the node A. This means taking caution when finding paths - it is important to exhaust possibilities that have already occurred and to not repeat them.

- "Cyclic node" (term I made up - not a real term) is really a variant on the circular graph, but it unique in it that at least one edge exists that goes from A to A. This is actually a useful representation when talking about state machines, but again, has the same issue as any other circular graph.

In Part 2, I will outline several problems I have recently solved using graphs (to show you that yes, these things are most certainly useful).
In Part 3, I will describe my implementation of a directed graph within Python.
One of the topics I've enjoyed most in college was data structures. Data structures are the organization of data to make it more effective towards different requirements and needs. These structures are fundamental to the design & creation of algorithms, and the improvement of performance on applications. They are a required part of life for any developer, and the understanding of their details is critical. In addition, they are critical for clear theoretical analysis of programs, while providing an easy way to provide a guess as to how a program will perform without heavy profiling. It is not a substitute for profiling, but data structures allow an easy way to think quickly about how code will do under pressure.
While the topics surrounding data structures can become very theoretical and mathematical, they should by no mean to be written off as "academic nonsense" because they exist in every single application, regardless of how obviously (or not) they stand out in code.
Data structures can range from a simple C struct:
// A very simple data structure
struct node {
char* name;
int age;
}
To full-blown systems that maintain element ordering and good performance upon iteration. They come in all sorts of forms, with some being very specific and built for one particular need (think Syntax trees for example) to extremely generalized ones (graphs with no sense of weight). Here are some of the most widely taught & commonly used data structures as I see it:
This list is up to debate obviously, but an understanding of these is critical to understanding of many other more specialized data structures that are built upon these.
My goal for many of these posts will be to examine some of the fundamentals of these structures, and then dive into more specific variants of many of these data structures.