hist

This section describes various options available for histogram plots in fivecentplots

See the full API

Setup

Import packages:


import fivecentplots as fcp
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pylab as plt
import imageio.v3 as imageio

Read some fake data to generate plots:


df = pd.read_csv(Path(fcp.__file__).parent / 'test_data/fake_data_box.csv')
df.head()

Batch Sample Region Value ID
0 101 1 Alpha123 3.5 ID701223A
1 101 1 Alpha123 0.0 ID7700-1222B
2 101 1 Alpha123 3.3 ID701223A
3 101 1 Alpha123 3.2 ID7700-1222B
4 101 1 Alpha123 4.0 ID701223A

Optionally set the design theme (skipping here and using default):


#fcp.set_theme('gray')
#fcp.set_theme('white')

Input data format

fcp.hist supports input data of two formats:

  1. tabular data found in a pd.DataFrame (with or without grouping columns)

  2. image data, either as a single np.array or a dict of np.arrays with a pd.DataFrame consisting of grouping information (see imshow documentation for a more detailed explaination of this format)

Simple histogram

Vertical bars

We calculate a simple histogram with default bin size of 20:


fcp.hist(df, x='Value')
_images/hist_16_0.png

..note:: “Counts” are automatically calculated based on the data in the “x” column

Horizontal bars

Same data as above but with histogram bars oriented horizontally:


fcp.hist(df, x='Value', horizontal=True)
_images/hist_20_0.png

Bin counts

We can change the number of bins used via the keyword hist_bins or bins:


fcp.hist(df, x='Value', bins=50)
_images/hist_23_0.png

Grouping

Legend

Add a legend:


fcp.hist(df, x='Value', legend='Region')
_images/hist_27_0.png

Row/column plot

Make multiple subplots with different row/column values:


fcp.hist(df, x='Value', legend='Region', col='Batch', row='Sample', ax_size=[250, 250])
_images/hist_30_0.png

Wrap plot

First we wrap the data using a column from the DataFrame:


fcp.hist(df, x='Value', legend='Region', wrap='Batch', ax_size=[250, 250], horizontal=True)
_images/hist_33_0.png

Next we wrap by x which means we make a subplot for each x-column name provided. To illustrate this, we create a couple of new columns in the DataFrame that are just multiples of the “Value” column:


df['Value*2'] = 2*df['Value']
df['Value*3'] = 3*df['Value']
fcp.hist(df, x=['Value', 'Value*2', 'Value*3'], wrap='x', ncol=3, ax_size=[250, 250])
_images/hist_35_0.png

Kernel density estimator

We can overlay a kernel density estimation curve on the histogram using keyword kde=True. These curves can be styled using standard line Element parameters prefixed by kde_:


fcp.hist(df, x='Value', legend='Region', kde=True, kde_width=2)
_images/hist_38_0.png

Other options

A couple of other options are available to present histogram data. Starting with our basic example from section 2:


fcp.hist(df, x='Value')
_images/hist_41_0.png

Cumulative

Now we enable “cumulative” mode so that each subsequent bin contains the total number of counts from the previous bins:


fcp.hist(df, x='Value', cumulative=True)  # or hist_cumulative to be more specific
_images/hist_44_0.png

Normalize

Histogram normalization divides each bin’s raw count the total number of counts and sets the bin width so that the area under the histogram integrates to 1.


fcp.hist(df, x='Value', normalize=True)  # or hist_normalize to be more specific
_images/hist_47_0.png

Images

fcp.hist is a powerful tool for data analysis of RAW and color images in image sensor / camera engineering activities. By default, image histograms are automatically converted to line plots with a histogram bin for each digital code from 0 to 2**bit_depth - 1. Additional options are also provided to split RAW images by color-filter array (CFA) pattern and color images by channel.

..warning:: For images with high bit-depth and thus a very high number of bins, user of np.histogram can be slow. However, if the image data is of interger data type, fivecentplots will use np.bincount which is insanely faster. Therefore, we recommend using integer-type image data wherever possible

RAW

First, consider a simple example of a 300x300 gray patch with all pixel values near the mid-level of a 16-bit camera with no color-filter array.


h, w = 300, 300
img = (np.ones([h, w]) * (2**10 - 1) / 2).astype(np.uint16)
img

array([[511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       ...,
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511]], dtype=uint16)

fcp.imshow(img, cmap='gray', zmin=0, zmax=2**10)
_images/hist_52_0.png

In this case, our histogram is a single point with 300 * 300 counts:


fcp.hist(img, markers=False, ax_size=[600, 400], line_width=2)
_images/hist_54_0.png

Now let’s multiplying our patch by a 2D Gaussian to approximate lens shading:


x, y = np.meshgrid(np.linspace(-1, 1, 300), np.linspace(-1, 1, 300))
dst = np.sqrt(x*x+y*y)
sigma = 1
muu = 0.001

gauss = np.exp(-((dst-muu)**2 / (2.0 * sigma**2 )))
img2 = (gauss * img).astype(np.uint16)
fcp.imshow(img2, cmap='gray', zmin=0, zmax=2**10)
_images/hist_56_0.png

The resulting histogram is shown below:


fcp.hist(img2, markers=False, line_width=2)
_images/hist_58_0.png

Bayer

Now lets mock-up a Bayer array for a light blue color patch and demonstrate how fivecentplots allows you to easily split the histogram into distinct color planes (based on a CFA pattern). Here we’ll assume “GRBG” CFA:


img_rgb = np.zeros([300, 300]).astype(np.uint16)
img_rgb[::2, ::2] = 180  # green_red
img_rgb[1::2, 1::2] = 180  # green_blue
img_rgb[::2, 1::2] = 10
img_rgb[1::2, ::2] = 255
fcp.imshow(img_rgb, cmap='Set1')
_images/hist_61_0.png

Which after basic demosaicing (no edge treatment) would give :


import colour_demosaicing
fcp.imshow(colour_demosaicing.demosaicing_CFA_Bayer_bilinear(np.array(img_rgb), 'GRBG').astype(np.uint8))
_images/hist_63_0.png

By default, the fcp.hist does not distinguish between pixel CFA type, so we end up with three distinct histogram peaks:


fcp.hist(img_rgb, markers=False, ax_scale='logy', ax_size=[600, 400], line_width=2, xmin=-5, xmax=260, ymin=1, ymax=60000)
_images/hist_65_0.png

However, if we specify the CFA via the keyword cfa, a new column in the grouping data named “Plane” is created. We can then legend by this color plane. Notice in this example that the “gr” and “gb” pixels overlap.


fcp.hist(img_rgb, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, xmin=-5, xmax=260,
         colors=fcp.RGGB)
_images/hist_67_0.png

..note:: For better visualization above, we also invoke a special color scheme shortcut fcp.RGGB to color the histograms for each plane according to their filter color

Now lets add some shading and noise to make a more meaningful histogram. This results in the color patch below.


# Gaussian shading
x, y = np.meshgrid(np.linspace(-1, 1, 300), np.linspace(-1, 1, 300))
dst = np.sqrt(x*x+y*y)
sigma = 1
muu = 0.001
gauss = np.exp(-((dst-muu)**2 / (2.0 * sigma**2)))
img_rgb2 = (gauss * img_rgb).astype(float)

# Random noise
img_rgb2[::2, ::2] += np.random.normal(-0.1*img_rgb2[::2, ::2].mean(), 0.1*img_rgb2[::2, ::2].mean(), img_rgb2[::2, ::2].shape)
img_rgb2[1::2, ::2] += np.random.normal(-0.1*img_rgb2[1::2, ::2].mean(), 0.1*img_rgb2[1::2, ::2].mean(), img_rgb2[1::2, ::2].shape)
img_rgb2[1::2, 1::2] += np.random.normal(-0.1*img_rgb2[1::2, 1::2].mean(), 0.1*img_rgb2[1::2, 1::2].mean(), img_rgb2[1::2, 1::2].shape)
img_rgb2[::2, 1::2] += np.random.normal(-0.1*img_rgb2[::2, 1::2].mean(), 0.1*img_rgb2[::2, 1::2].mean(), img_rgb2[::2, 1::2].shape)
img_rgb2 = img_rgb2.astype(np.uint16)
fcp.imshow(colour_demosaicing.demosaicing_CFA_Bayer_bilinear(img_rgb2, 'GRBG').astype(np.uint8))
_images/hist_70_0.png

Again, invoking the cfa keyword with legending we get the following:


fcp.hist(img_rgb2, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.RGGB)
_images/hist_72_0.png

RGB images

fcp.hist also provides support for RGB data:


img_rgb = imageio.imread(Path(fcp.__file__).parent / 'test_data/imshow_cat_pirate.png')
fcp.imshow(img_rgb)
_images/hist_75_0.png

If no, color channel information is provided to fcp.hist, the luminosity histogram of the grayscale representation of the RGB image is provided:


fcp.hist(img_rgb, markers=False, ax_size=[600, 400], line_width=2, line_color='#555555')
_images/hist_77_0.png

If color channel separation is desired, use the grouping label “Channel” (which is automatically calculated by fivecentplots) with a grouping kwarg:


fcp.hist(img_rgb, legend='Channel', markers=False, ax_size=[600, 400], line_width=2, colors=fcp.RGB)
_images/hist_79_0.png

..note:: For better visualization above, we also invoke a special color scheme shortcut fcp.RGB to color the histograms according to the specific color channel

PDF

fivecentplots histograms can be converted to probability density functions inline using the keyword pdf=True. Here we use the shaded color patch with noise from above for our input.


fcp.hist(img_rgb2, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.RGGB, pdf=True)
_images/hist_83_0.png

CDF

fivecentplots histograms can also be converted to cumulative distribution functions inline using the keyword pdf=True. Again, we use the shaded color patch with noise from above for our input. With no color plane separation:


fcp.hist(img2, markers=False, ax_size=[600, 400], line_width=2, colors=fcp.RGGB, cdf=True)
_images/hist_86_0.png

With color plane separation:


fcp.hist(img_rgb2, markers=False, ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.RGGB, cdf=True)
_images/hist_88_0.png

Styles

Bar style

Colors

The bar edge and fill colors can be controlled by kwargs:


fcp.hist(df, x='Value', hist_edge_color='#555555', hist_edge_width=2, hist_fill_alpha=1, hist_fill_color='#FF0000')
_images/hist_93_0.png

Alignment

The alignment of the bars relative to the ticks on the x-axis can be adjusted. Options include: {‘left’, ‘mid’ [default], ‘right’}


fcp.hist(df, x='Value', hist_align='right')
_images/hist_96_0.png

fcp.hist(df, x='Value', hist_align='mid')
_images/hist_97_0.png

Width

The relative width (i.e., the percentage of the overall bin width) of the bars can be controlled by the keyword hist_rwidth:


fcp.hist(df, x='Value', hist_rwidth=0.3)
_images/hist_100_0.png

fcp.HIST preset

For more convenient styling of histogram plots from image data, we provide a “preset” dictionary with some common kwargs already defined:


fcp.HIST

{'ax_scale': 'logy', 'markers': False, 'line_width': 2, 'preset': 'HIST'}

Without the preset:


fcp.hist(img_rgb2, ax_size=[600, 400])
_images/hist_105_0.png

With the preset:


fcp.hist(img_rgb2, ax_size=[600, 400], **fcp.HIST)
_images/hist_107_0.png