syaffers.xyz

The perils of palette transfer

#python #ml #tutorials

A few weekends ago, I wanted to brush up on my clustering techniques. I recall — from an image processing class back in college — that you can reduce the number of unique colors in an image down to a few colors while keeping the important ones. This is achieved by doing k-means clustering on the RGB values of the pixels in an image. The number of colors that you want in your reduced palette is the k value in k-means clustering; in other words the number of cluster centers in RGB space that your image inhabits.

This was particularly useful in the old days of video game graphics if an image needed to be rendered on a device with limited memory. By reducing the palette, you can conform to the limited color palette of PAL or SECAM, etc. since only 8 bits or 16 bits are allocated to the entire color palette (in contrast to the 24-bit RGB that is ubiquitous now). Today’s use? Well, if you’re particularly nostalgic about the artistry of low-bit graphics or building something to display on a LED matrix, this could be the use case.

While playing around, I casually thought: “What if we transferred the reduced palette of one image into another?” It’s a seemingly innocuous question that expanded my knowledge of colors and parallel-processing toolbox. In this article, I’m going to explain how this artificial task of palette transfer can be done and how to take it further. Get ready to use tools from numpy, scikit-learn and dask. Look for the code on a prepared Colab notebook containing everything explained in this article.

Palette reduction with k-means clustering

So let’s get some definitions in place before anything else. What I mean by palette is the set of RGB pixel values in an image. This number is usually in the tens of thousands for a small photographic image. Palette reduction is the algorithmic selection of a subset of the original palette of an image and mapping of all pixel values in the original palette to the closest values in the reduced palette.

The code for this is relatively straightforward: run k-means clustering on all the pixels of an image and the resulting cluster centers are your reduced palette. We have to be careful as the cluster centers are not np.uint8 type but rather np.float64. When we call kmn.predict(src), the output values are all floats. The simple solution to this is just to round up these values to the nearest integer and cast it to np.uint8 type before displaying it. Here’s what a reduced palette class could look like:

class KMeansReducedPalette:
    def __init__(self, num_colors):
        self.num_colors = num_colors
        # Random state for reproducibility.
        self.kmeans = KMeans(num_colors, random_state=0xfee1600d)
        self.source_pixels = None

    def _preprocess(self, image):
        assert image.shape[-1] == 3, 'image must have exactly 3 color channels'
        assert image.dtype == 'uint8', 'image must be in np.uint8 type'

        # Flatten pixels, if not already.
        if len(image.shape) > 2:
            return image.reshape(-1, 3)

        return image

    def fit(self, image):
        image_cpy = image.copy()
        self.source_pixels = self._preprocess(image_cpy)
        self.kmeans.fit(self.source_pixels)

    def recolor(self, image):
        original_shape = image.shape
        image = self._preprocess(image)
        recolor_idx = self.kmeans.predict(image)
        recolor = self.kmeans.cluster_centers_[recolor_idx]
        recolor = recolor.reshape(original_shape)

        return np.round(recolor).astype(np.uint8)

The class definition is pretty basic: it contains the number of colors in the reduced palette and a sklearn.clustering.KMeans instance. First, we have a preprocessing function that makes sure the image is shaped up (pun intended) for the KMeans operations. The reshaping reforms the 3-dimensional matrix of \(W \times H \times C\) into a 2-dimensional matrix of \(WH \times C\) where \(W\) is the width of the image, \(H\) is the height of the image and \(C\) is the three RGB color channels. Then, the KMeansReducedPalette.fit() function works as a wrapper to the KMeans.fit() function that creates the cluster centers in RGB space. Finally, the recolor() function transforms the original palette of an image into the reduced palette of the fitted image. Let’s look at some examples of a palette reduction:

Various levels of palette reduction of a pink flower

Pretty, right? Even with the reduced palette, there is still a lot of charm to the image. Says a lot about the photographer’s chops, I’d say. One way to visualize these clusters is just to scatter plot them. Our data here is the unique RGB values of the image which is in 3-dimensions: one dimension for each color channel. This lends itself quite well to a 3D scatter plot:

Scatter plot of 1000 random colors in Gozha’s flower in RGB space with 8 centroids marked with a red cross

At 8 colors, there are relatively loose clusters but comparing the clusters colored by the centroid in comparison to the original colors, I think the clusters are pretty good. The colors do not diverge far although there are some light greens that are being cast grayish-green. With more points, the clusters would be more representative of the salient colors.

Reduced palette transfer

Now, let’s tackle the second problem: transferring one image’s reduced palette into another. Turns out, this is relatively easy, we don’t need any further programming aside from passing a target image into the palette recolor() function. Let’s look at some transfers:

Transferring the reduced palette of Gozha’s flower onto other images. From left to right, original image by MIO ITO, Daniel Roe, and Carles Rabada on Unsplash

What happens here is that the distance of each pixel in the new image is measured against the reduced palette (i.e. the cluster centers). The RGB values of the transferring pixel are then swapped with the closest cluster center in the palette. This is why in situations where the source image doesn’t have many colors, or the reduced palette is too small, we get some cartoon-like effects in the overall image. It’s because the colors in the transferring image don’t get a good enough color to swap with but instead takes the nearest cluster center despite it being very distant to the transferring pixel value.

Experiments with palette transfer

Let’s experiment a bit with this before moving on. We did clustering based on all the pixels in the image but in reality, not all the pixels are unique. Chances are, around a particular region of an image, there are maybe tens or hundreds of the same pixel RGB value (particularly for static backgrounds, skies, and the like). Naturally, this will skew cluster centers around more abundant colors in the image; these are the colors that would immediately pop in the original image making them visually salient. For fun, let’s hack around and do clustering only on the unique pixels in the image (instead of all the pixels) and give them each equal opportunity. We will create a new class to enable this:

class UniqueKMeansReducedPalette(KMeansReducedPalette):
    def __init__(self, num_colors):
        super().__init__(num_colors)

    def fit(self, image):
        image_cpy = image.copy()
        pixels = self._preprocess(image_cpy)
        super().fit(np.unique(pixels, axis=0))

The new class inherits the KMeansReducedPalette class which makes things much easier down the line: any further functionality that we add to the KMeansReducedPalette will be included in the UniqueKMeansReducedPalette class. The fit() function is overridden to fit the palette onto the set of unique pixels in the image. This doesn’t add that much more to the overall computation since k-means clustering takes \(O(nk)\) time which is more expensive than finding unique pixels which only takes \(O(n)\) for an image with \(n\) pixels. Let’s compare the results:

Comparison of whole image palette reduction and unique pixels palette reduction on different images

From afar, there are very small differences: there seems to be about the same distribution of colors in the reduced palettes which can be attributed to the initial conditions of the k-means clustering algorithm.

To my eyes, there is a subtle difference in the quality of the colors. Clustering on the entire image resulted in a slightly more saturated reduced palette in comparison to the clustering of the unique pixels which is more muted. There is subtly more contrast in the top row compared to the bottom rows. I attribute this to the result of clustering in the space of unique colors: essentially averaging the pixel values surrounding a locus without weighing colors that are more prevalent.

We could go into a numerical analysis into color metrics to verify this (which I did, endlessly) but that would stray too far from the original intent of this article which is just to experiment with palette transfers so I will leave this as an exercise to the courageous among you. Here are some resources that I read through: color vision, chroma calculation.

Colorful random walks

Let’s revisit what k-means clustering results in. After clustering, we are provided with k cluster centers which determine the label of other pixels surrounding it. If a point is in proximity to one centroid in comparison to all other centroids, the point is grouped into that centroid. Essentially, this creates k partitions in the data space; the boundaries of which can be visualized using a Voronoi diagram. In the scatter plot earlier, we can partially see the boundaries of the partitioning in 3D space. Since plotting a 3D Voronoi diagram is difficult, let’s turn to PCA and reduce the space to 2D and plot the Voronoi tessellation.

Voronoi tessellation of 8 clusters of RGB pixels of a pink flower in 2D via PCA decomposition

What is clear from the plot is that the partitions are 2D geometric polygons. In the original 3D space, these partitions would be 3D polyhedrons. Also, the PCA “lies” about some of the points, especially at the boundaries, as the points have been projected down to a lower dimension (hence losing information). Try not to scrutinize the 2D Voronoi diagram too hard.

Knowing the boundaries, we can randomly select RGB values within the partition as the recoloring of our new image rather than depending solely on the centroid’s RGB value. This gives us more variety to recolor our images. As we already have the centroids, we can start there and take small steps to reach random RGB values. This process is called a random walk.

Random walk variant 1: walk everywhere

Let’s understand one variation of a random walk which is to move by 1 intensity unit in any RGB direction. This random walk can potentially explore the entirety of the 16,777,216 colors (if you take enough steps). Let’s code this up:

class KMeansReducedPalette:
    # ... omitted

    def random_walk_recolor(self, image, max_steps):
        original_shape = image.shape
        image = self._preprocess(image)
        centroids = self.kmeans.predict(image)
        start = np.round(
            self.kmeans.cluster_centers_[centroids]
        )

        diff = np.zeros(image.shape)

        for _ in range(max_steps):
            walk = np.eye(3)[
                np.random.randint(0, 3, size=image.shape[0])
            ]
            sign = np.random.choice(
                [-1, 1], size=(image.shape[0], 1)
            )
            diff += walk * sign

        recolor = np.clip(start + diff, 0, 255).astype(np.uint8)

        return recolor.reshape(original_shape)

The random_walk_recolor() function takes an image and the maximum number of steps to walk away from the cluster center. First, we recolor pixels as we did in the recolor() function which is stored as the starting points (start). Then we initialize a difference array (diff) which will determine the distance from the starting points for each color channel.

Then we loop up to the max number of steps we want to take and take a walk of 1 unit in a single channel randomly. This is stored in walk: we randomly index a 3x3 identity matrix using np.eye() and np.random.randint() for each pixel. For example an image with 9 pixels:

walk = [[ 0, 0, 1],
        [ 0, 0, 1],
        [ 1, 0, 0],
        [ 0, 1, 0],
        [ 0, 0, 1],
        [ 0, 1, 0],
        [ 1, 0, 0],
        [ 1, 0, 0],
        [ 1, 0, 0]]

Then we need to randomly select the sign of the walk (either in a positive or negative direction) using np.random.choice() on the set {-1, 1} which then transforms the walk array into:

walk = [[ 0, 0, 1],
        [ 0, 0,-1],
        [-1, 0, 0],
        [ 0,-1, 0],
        [ 0, 0, 1],
        [ 0,-1, 0],
        [ 1, 0, 0],
        [-1, 0, 0],
        [-1, 0, 0]]

Finally, we add the walk into the diff array. If we repeat this enough times, we get some random distribution of steps away from the cluster centers that is at most the maximum number of steps that we desire. The final random walk diff might be something like so for a maximum step size of 3:

walk = [[ 0, 0, 1],
        [ 0, 0,-2],
        [-1, 1, 0],
        [ 0,-1,-1],
        [ 0, 0, 3],
        [ 0,-1, 0],
        [ 2, 0, 1],
        [ 0, 0, 0],
        [-1,-1, 1]]

Due to the randomness of random walk, we might not even move away from the cluster center; this is highly unlikely for a large step size though. The final recoloring is simply the starting points of the k-means recoloring plus the walk. How does a recoloring of different step values look like with a 32-color palette on a different transferring image? For this example, I will transfer the palette of Rabada’s seaside to Gozha’s flower.

Noisy random walks at different max walk distances

Well, looks like we just applied some noise to it but we maintain the general transfer. The noise does get rather overwhelming at very high distances. But it’s cool nonetheless.

Random walk variant 2: neighborhood walk

The first random walk variant is not guaranteed to visit the unique pixels we have in the original image. In fact, it may even transcend the boundaries of the partitioning that we mentioned earlier. What if we instead wanted to only walk around the neighboring pixels to the cluster center? Also, we would want to stay within a partition; we don’t want to be wandering around the wrong neighborhood now, do we?

To find a random pixel that neighbors a centroid, we would need to calculate the distance of each of the pixels in a cluster to the centroid and randomly pick the closest neighbors up to some max walking step. One optimization we can make in this particular scenario is to precalculate these distances as they will never change once clustering is done.

The strategy here is to perform all distance calculations for all pixels within each cluster to its nearest centroid during the fit() stage, sort the pixels based on their distance to the centroid in ascending order, and store it so it can be used later. Here’s the code for that:

class KMeansReducedPalette:
    # ... omitted

    def fit(self, image):
        image_cpy = image.copy()
        self.source_pixels = self._preprocess(image_cpy)
        self.kmeans.fit(self.source_pixels)

        self.centroid_nearest_pixels = []

        for ci in range(self.num_colors):
            pixels_ci = self.source_pixels[
                self.kmeans.labels_ == ci
            ]
            distances_ci = np.sqrt(np.sum(
                np.square(
                    pixels_ci - self.kmeans.cluster_centers_[ci]
                ),
                axis=1,
            ))
            pixels_ci = pixels_ci[np.argsort(distances_ci)]

            self.centroid_nearest_pixels.append(pixels_ci)

    # ... omitted

What this does is loop through the number of clusters in the palette and store the distances for each pixel to the cluster center. First, we get all the pixels from the original image that is in the current cluster (pixels_ci) and calculate the Euclidean distance to the cluster center (distances_ci). Then we use np.argsort() to get the sorted indices for the distances and then use it to index pixels_ci. Now we are just left with a matrix of pixels in ascending order of distance down the rows.

To do a random walk, we just select a random number up to the maximum index value (the random number represents the number of steps). We would also need to make sure that the index does not exceed the size of the cluster. Here’s the code to recolor with this neighborhood random walk:

class KMeansReducedPalette:
    # ... omitted

    def random_neighborhood_walk_recolor(self, image, max_steps):
        original_shape = image.shape
        image = self._preprocess(image)
        recolor = image.copy()
        centroid_idxs = self.kmeans.predict(image)

        for ci in range(self.num_colors):
            n_pixels = np.sum(centroid_idxs == ci)

            if n_pixels == 0:
                continue

            n_neighbors = self.centroid_nearest_pixels[ci].shape[0]
            neighbor_idxs = np.random.randint(
                min(max_steps, n_neighbors),
                size=(n_pixels,),
            )
            recolor_ci = (
                self.centroid_nearest_pixels[ci][neighbor_idxs]
            )
            recolor[centroid_idxs == ci] = recolor_ci

        return recolor.reshape(original_shape)

The code is relatively simple: we first recolor as we would for the other random walk and then iterate through each cluster and pick a random nearest neighbor. We used np.random.randint() to generate random indices to select the nearest neighbor. Since the random function is maxed at the max_steps, we won’t explore beyond the neighbor that is ranked below the max_steps value. Additionally, setting the maximum random number to the minimum value between n_neighbors and max_steps ensures that you are within the confines of the neighborhood. The resulting random walks are as shown below:

Noisy neighborhood walks at different max walk distances

The noise is a lot more subtle and unnoticeable in the lower distances. The “semantic” of the noise also more sensible: it doesn’t stray too far away from what the original pixel is meant to represent. It’s almost as if you got a photo from a DSLR with a bad sensor or some high ISO value. Nice!

Whole palette transfer

All the techniques we’ve seen reduces the palette before making a transfer. Arguably, this is a much easier transfer to make as we only need to compare n pixels with k cluster centers. What if instead, we wanted to do a palette transfer of the all unique pixels? Well, if we assumed we had an equal palette size in the source image and the transferring image, we will need to do an \(O(n^2)\) comparison to find the nearest pixel of the transferring image and the source image.

This is problematic: our 320 by 480 pixels working images have an order of \(10^5\) unique pixels. A pairwise comparison would yield a matrix that is in the order of \((10^5)^2\) or \(10^10\) during the computation. Clearly, this cannot fit into memory unless you have a massive amount of memory (and most commodity computers don’t). We could go with looping but Python loops are notoriously slow.

An ideal middle ground can be achieved if we deployed some form of vectorized computation (which is faster than loops) along with array chunking. Chunking divides a large vectorized computation into smaller manageable computations and collects the pieces to form the final output array. The library dask is a popular implementation in Python that will allow us to this while keeping the numpy syntax flavor in the code. dask enables parallelization too, which means we can defer these computations potentially to different machines or completely exploit all the cores of a computer. Here’s what the code looks like:

class EntirePalette:
    def __init__(self, chunk_size=1024):
        self.chunk_size = chunk_size
        self.source_pixels = None

    def _preprocess(self, image):
        # ... omitted, same as KMeansReducedPalette

    def fit(self, image):
        self.source_pixels = np.unique(
            self._preprocess(image), axis=0
        )

    def recolor(self, image):
        image_shape = image.shape
        image = self._preprocess(image)
        image_colors = np.unique(image, axis=0)

        self_da = da.from_array(
            self.source_pixels.astype(np.long),
            chunks=(self.chunk_size, 3),
        )
        other_da = da.from_array(
            image_colors.reshape(-1, 1, 3).astype(np.long),
            chunks=(self.chunk_size, 1, 3),
        )

        idxs_da = (
            da.square(other_da - self_da)
            .sum(axis=2)
            .argmin(axis=1)
        )

        with ProgressBar():
            mapping_idx = idxs_da.compute()

        colormap = {
            tuple(a): tuple(b)
            for a, b
            in zip(image_colors, self.source_colors[mapping_idx])
        }

        image_recolored = np.array(
            [colormap[tuple(rgb)] for rgb in image]
        )
        return image_recolored.reshape(image_shape).astype(np.uint8)

The entire palette transfer class has a very basic constructor. The constructor requires a chunk size value (defaulted at 1024). This trades off computation memory and parallelization in a small computer but you can adjust to larger values if you have more memory. The fitting step just stores all the unique pixels in an image. We only take the unique pixels in the source image to minimize computation since we can map all pixels in the transferring image to a subset of the source pixels. This mapping will be created when we perform recolor().

The meat of the class is the recolor() function. First, we preprocess the transferring image and collect all the unique pixels. Then we create a dask.array.core.Array object for the unique pixels in the source image and the transferring image. The transferring image is reshaped to a 3D array for vectorization of the pairwise difference calculation.

The computational burden

Let’s pause here and understand the computational burden that we are bearing. dask has a feature of visualizing the memory allocated for a computation step in a notebook environment. If you place the following code for a source and transferring image:

We get the following images displayed in the notebook output cell:

Dask visualization of computations of pairwise difference and allocating 77GB in the process!

Clearly, the source and transferring dask array of unique pixels occupy very little memory. However, the moment, we execute the pairwise difference, we would need to allocate 77.29 GB of memory. In a traditional NumPy setting, this would either crash or NumPy will yield an error message saying it cannot allocate that much memory. This is when dask shines for its chunking capabilities to enable the computation of such massive arrays without actually allocating all that memory.

Resolving the burden

Ultimately, we don’t need the pairwise distances, we only need the transferring pixels that has the smallest distance to the source pixels. Returning back to the code, we perform the Euclidean distance calculation and only keep the index of the transferring pixel which has the smallest distance to the source pixels, thus leaving a much smaller output array.

To actually run the computation, we must call idxs_da.compute() and we will make use of the built-in ProgressBar() that comes with dask to display the progress of the calculation. Once we have the indices calculated, we create a dictionary mapping from one RGB source pixel to the closest RGB transferring pixel. Python dictionaries use hashing which means that lookups are very fast (theoretically, it takes constant time). Finally, to transfer the pixels, we loop over all pixels in the transferring image and use the pixel mapping to swap the RGB pixel values.

Let’s view some of the transfers! This time I will use Ito’s flowers and transfer the palette to the rest of the images.

Whole palette transfer from one image to three others

Beautiful outputs. The transferred images have a more natural coloring, less blockiness, and look very similar to their originals. While doing this transfer, I couldn’t help but notice that Gozha’s flower really could’ve had some better transfers. There are definitely some pinks or reddish hues in Ito’s flower that would’ve been better for the transfer but perhaps was just too far away. What I did not realize was the weakness of our method: the Euclidean distance.

A better color distance

Here’s the problem: the Euclidean distance assumes that colors are distributed equally around a 255 by 255 by 255 3D cube. The distances between those points would determine how far the colors were from one another. However, perceptually, we perceive the distances differently. It’s a very profound question to ask “how far is one red hue to another?”, but it suffices to say that the Euclidean distance does not do our differential perception justice.

So I went scouring the web for an answer for a distance measure that is simple to compute and yet does the job to some extent. This site details the calculation using RGB space for a more “correct” perceptual color distance. I’ll leave the discourse to the site itself but in summary, the author compromises between converting into another color space (like YUV or CIELAB) and considered correctional factors that would affect the distance measure. Also, it is “more stable” in terms of the values of distance that the algorithm spits out. The distance calculation is as follows (adapted from the site):

$$ \begin{aligned} \bar{r} &= \frac{C_{1,R} + C_{2,R}}{2} \\ \Delta R &= C_{1,R} - C_{2,R} \\ \Delta G &= C_{1,G} - C_{2,G} \\ \Delta B &= C_{1,B} - C_{2,B} \\ \Delta C &= \sqrt{ \Big(2 + \frac{\bar{r}}{256}\Big) \Delta R^2+ 4 \Delta G^2 + \Big(2 + \frac{255 - \bar{r}}{256}\Big) \Delta B^2 } \end{aligned} $$

In the equation, \(C\) is the RGB vector of a pixel, the subscript number indicates which pixel we are referring to (comparing pixel 1 and pixel 2) and the subscript letter is the color channel of the pixel. The calculation first makes an average of the red values between the two pixels. Then, we calculate the difference between each color component of the pixels. The final distance is basically a weighted Euclidean distance with the weights being determined differently for each color component.

We can adapt the code from the site that leverages some bitwise operation hacks into Python. For reference, here is the C++ function definition from the site:

typedef struct {
  unsigned char r, g, b;
} RGB;

double ColourDistance(RGB e1, RGB e2)
{
  long rmean = ( (long)e1.r + (long)e2.r ) / 2;
  long r = (long)e1.r - (long)e2.r;
  long g = (long)e1.g - (long)e2.g;
  long b = (long)e1.b - (long)e2.b;
  return sqrt(
    (((512+rmean)*r*r)>>8) + 4*g*g + (((767-rmean)*b*b)>>8)
  );
}

Comments on computation

To keep computations fast, I will forego the square root at the end without breaking the distance calculation. This is because the square root is a monotonically increasing function for all positive numbers. Since values in the square root of \(\Delta C\) are always positive, all we need to compare to get the closest pixel is the summation term within the square root. Also, since we are using dask, we will need to code the vectorizations too. Note that we typecast into np.long type so we can deal with the squares of \(\Delta R\), \(\Delta G\), and \(\Delta B\).

class EntirePalette:
    # ... omitted

    def recolor(self, image):
        image_shape = image.shape
        image = self._preprocess(image)
        image_colors = np.unique(image, axis=0)

        self_da = da.from_array(
            self.source_pixels.astype(np.long),
            chunks=(self.chunk_size, 3),
        )
        other_da = da.from_array(
            image_colors.reshape(-1, 1, 3).astype(np.long),
            chunks=(self.chunk_size, 1, 3),
        )

        # Perceptually "better" color distance.
        rmean_da = (other_da[:, :, 0] + self_da[:, 0]) // 2
        rgb_da = other_da - self_da
        r_da = ((512 + rmean_da) * rgb_da[:, :, 0] ** 2) >> 8
        g_da = 4 * rgb_da[:, :, 1] ** 2
        b_da = ((767 - rmean_da) * rgb_da[:, :, 2] ** 2) >> 8
        result_da = (r_da + g_da + b_da).argmin(axis=1)

        with ProgressBar():
            mapping_idx = idxs_da.compute()

        colormap = {
            tuple(a): tuple(b)
            for a, b
            in zip(image_colors, self.source_colors[mapping_idx])
        }

        image_recolored = np.array(
            [colormap[tuple(rgb)] for rgb in image]
        )
        return image_recolored.reshape(image_shape).astype(np.uint8)

I hope the code is self-explanatory if you can make the connections to the C++ code above and the original equations. The gist of it is that we vectorized the difference calculation between each color channel and also implemented the bitwise operations per the original code. What do the outputs look like?

Whole palette transfer using the color distance

That did the trick for Gozha’s flower, a more reddish hue is now transferred instead of that bland tan-brown color that the Euclidean distance gave. Otherwise, the colors of Rabada’s seaside and Roe’s mountains are about the same, there are very subtle differences in color tones but nothing sticks out. I’ll leave you with this: our visual perception of color is definitely different to the data structures that we have chosen to embody them.

Conclusion

That is it from me on palette transfer. I enjoy doing fun weekend projects like these a documenting them thoroughly. This palette transfer exercise has given me a chance to explore weird ideas I have in a programmatic way and also to discover new tools and learn new skills.

My main future exploration would be is how complex the digital representation of colors are. Color spaces are very intriguing and there is a lot that I don’t yet fully understand. Nevertheless, I hope you enjoyed reading it as much as I did writing it. Until next time!

Attributions