Solution: K Closest Points to Origin
Let's solve the K Closest Points to Origin problem using the Top K Elements pattern.
Statement
Given a list of points on a plane, where the plane is a 2-D array with (x, y) coordinates, find the closest points to the origin .
Note: Here, the distance between two points on a plane is the Euclidean distance:
Constraints:
-
k
points.length
-
x[i]
,y[i]
Solution
So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.
Naive approach
When thinking about how to solve this problem, it may help to solve a simpler problem—find the point closest to the origin. This would involve a linear scan through the unsorted list of points, with, at each step, a comparison between the closest point discovered so far and the current point from the list. The point closer to the origin would then continue as the candidate solution. This has a runtime complexity of as opposed to the naive solution of sorting the points by distance from the origin and picking the first one, which has a complexity of .
For this reason, when extending the solution to the closest points to the origin, we’d ideally like to do one scan through the list of points. However, if we have to check the current set of closest points with the current point under consideration at each step, we’ll end up with a time complexity of .
Optimized approach using top K elements:
The intuition behind this approach centers around efficiently managing comparisons and updates using a max-heap data structure. By maintaining a max-heap of the closest points sorted by their Euclidean distance from the origin, the algorithm ensures access to the point farthest from the origin among the top k closest points. This allows for a direct comparison between the farthest point in the heap and each new candidate point. If a candidate point is closer, it replaces the farthest point in the heap, keeping the heap filled with the closest points.
By organizing our set of closest points in a max-heap, we reduce the number of comparisons at each step. Instead of comparing all points with the next point from the list, we simply compare the farthest point in the max-heap with the next point. If the new point is closer to the origin, it replaces the farthest point in the heap. By the end of the process, the max-heap contains exactly the closest points to the origin.
In this way, at every step of the scan through the list, the max-heap acts like a sieve, picking out the top points in terms of their distance from the origin.
And as we’ll see, the time complexity is much better than .
The Euclidean distance between a point P(x, y) and the origin can be calculated using the following formula:
Now that we can calculate the distance between and all the points, how will we find the nearest points?
As discussed above, the heap data structure is ideal for this purpose—we’ll use a custom comparison function to define the order of the elements in a heap. Since we plan to populate the heap with coordinate pairs, we’ll define a class Point
that will contain x
and y
coordinates of a point and a function distFromOrigin()
that will calculate the distance from the origin. We’ll will compare the distances of the two points relative to the origin. The point closer to the origin will be considered less than the other point.
We’ll iterate through the given list of points, and if we find one that is closer to the origin than the point at the root of the max-heap, we do the following two things:
- Pop from the max-heap—that is, remove the point in the heap farthest from the origin.
- Push the point that is closer to the origin onto the max-heap.
As we move through the given list of points, this will ensure that we always have the points in our heap that are the closest to the origin.
Below is an illustration of this process.
Let’s look at the code for this solution below:
class ClosestPoints {public static List<Point> kClosest(Point[] points, int k) {PriorityQueue<Point> maxHeap = new PriorityQueue< >((p1, p2) -> p2.distFromOrigin() - p1.distFromOrigin());for (int i = 0; i < k; i++)maxHeap.add(points[i]);for (int i = k; i < points.length; i++) {if (points[i].distFromOrigin() < maxHeap.peek().distFromOrigin()) {maxHeap.poll();maxHeap.add(points[i]);}}return new ArrayList< >(maxHeap);}// Driver codepublic static void main(String[] args) {Point[] pointsOne = new Point[] {new Point(1, 3), new Point(3, 4), new Point(2, -1)};Point[] pointsTwo = new Point[] {new Point(1, 3), new Point(2, 4), new Point(2, -1), new Point(-2, 2), new Point(5, 3), new Point(3, -2)};Point[] pointsThree = new Point[] {new Point(1, 3), new Point(5, 3), new Point(3, -2), new Point(-2, 2)};Point[] pointsFour = new Point[] {new Point(2, -1), new Point(-2, 2), new Point(1, 3), new Point(2, 4)};Point[] pointsFive = new Point[] {new Point(1, 3), new Point(2, 4), new Point(2, -1), new Point(-2, 2), new Point(5, 3), new Point(3, -2), new Point(5, 3), new Point(3, -2)};Point[][] points = {pointsOne, pointsTwo, pointsThree, pointsFour, pointsFive};int[] kList = {2, 3, 1, 4, 5};for (int i = 0; i < points.length; i++) {System.out.print((i + 1) + ".\tSet of points: ");for (Point p: points[i])System.out.print("[" + p.x + " , " + p.y + "] ");System.out.println("\n\tK: " + kList[i]);List<Point> result = ClosestPoints.kClosest(points[i], kList[i]);System.out.print("\tHere are the k = " + kList[i] + " points closest to the origin(0, 0): ");for (Point p: result)System.out.print("[" + p.x + " , " + p.y + "] ");System.out.println("\n");System.out.println(PrintHyphens.repeat("-", 100));}}}
Solution summary
To recap, the solution to this problem can be divided into the following parts:
-
Push the first points to the max-heap.
-
Calculate and compare the distances of the points list with the distance of the top of the max-heap.
-
If the point distance is smaller than the max-heap top’s distance, pop the top from the max heap and push the point.
Time complexity
The algorithm’s time complexity is , where is the total number of points and is the number of points closest to the origin. This is because we need to iterate over all the points and perform operations on a heap of size , which takes time in the worst case.
Space complexity
The memory complexity will be because we need to store points in the heap.
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.