hist

This section describes various options available for histogram plots in fivecentplots

See the full API

Setup

Import packages:


import fivecentplots as fcp
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pylab as plt

Read some fake data to generate plots:


df = pd.read_csv(Path(fcp.__file__).parent / 'test_data/fake_data_box.csv')
df.head()

Batch Sample Region Value ID
0 101 1 Alpha123 3.5 ID701223A
1 101 1 Alpha123 0.0 ID7700-1222B
2 101 1 Alpha123 3.3 ID701223A
3 101 1 Alpha123 3.2 ID7700-1222B
4 101 1 Alpha123 4.0 ID701223A

Optionally set the design theme (skipping here and using default):


#fcp.set_theme('gray')
#fcp.set_theme('white')

Simple histogram

Vertical bars

We calculate a simple histogram with default bin size of 20.

Note that “counts” are automatically calculated based on the data in the “x” column


fcp.hist(df, x='Value')
_images/hist_14_0.png

Horizontal bars

Same data as above but with histogram bars oriented horizontally:


fcp.hist(df, x='Value', horizontal=True)
_images/hist_17_0.png

Bin counts

We can change the number of bins used via the keyword hist_bins or bins:


fcp.hist(df, x='Value', bins=50)
_images/hist_20_0.png

Grouping

Legend

Add a legend:


fcp.hist(df, x='Value', legend='Region')
_images/hist_24_0.png

Row/column plot

Make multiple subplots with different row/column values:


fcp.hist(df, x='Value', legend='Region', col='Batch', row='Sample', ax_size=[250, 250])
_images/hist_27_0.png

Wrap plot

First we wrap the data using a column from the DataFrame:


fcp.hist(df, x='Value', legend='Region', wrap='Batch', ax_size=[250, 250], horizontal=True)
_images/hist_30_0.png

Next we wrap by x which means we make a subplot for each x-column name provided. To illustrate this, we create a couple of new columns in the DataFrame that are just multiples of the “Value” column:


df['Value*2'] = 2*df['Value']
df['Value*3'] = 3*df['Value']
fcp.hist(df, x=['Value', 'Value*2', 'Value*3'], wrap='x', ncol=3, ax_size=[250, 250])
_images/hist_32_0.png

Kernel density estimator

We can overlay a kernel density estimation curve on the histogram using keyword kde=True. These curves can be styled using standard line Element parameters prefixed by kde_:


fcp.hist(df, x='Value', legend='Region', kde=True, kde_width=2)
_images/hist_35_0.png

Other options

A couple of other options are available to present histogram data. Starting with our basic example from section 2:


fcp.hist(df, x='Value')
_images/hist_38_0.png

Cumulative

Now we enable “cumulative” mode so that each subsequent bin contains the total number of counts from the previous bins:


fcp.hist(df, x='Value', hist_cumulative=True)
_images/hist_41_0.png

Normalize

Next we enable hist_normalize=True to normalize the histogram such that each bin’s raw count is divided by the total number of counts and the bin width so that the area under the histogram integrates to 1.


fcp.hist(df, x='Value', hist_normalize=True)
_images/hist_44_0.png

Images

hist plots in fivecentplots can also be used generate histograms of pixel values from raw images (helpful for image sensor / camera engineering activities). Options are also provided to split these raw images by color-filter array (CFA) pattern. When plotting histograms of images, it is assumed that each digital number in the image data should be represented by its own bin so the number of bins is auto-calculated based on the min/max pixel values in the image data.

Mono

First, consider a simple example of a 300x300 gray patch with all pixel values near the mid-level of a 10-bit camera (no color-filter array).


h, w = 300, 300
img = (np.ones([h, w]) * (2**10 - 1) / 2).astype(np.uint16)
img

array([[511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       ...,
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511],
       [511, 511, 511, ..., 511, 511, 511]], dtype=uint16)

plt.imshow(img, cmap='gray', vmin=0, vmax=2**10)

<matplotlib.image.AxesImage at 0x7ff612dad7f0>
_images/hist_49_1.png

In this case, our histogram is a single point with 300 * 300 counts:


fcp.hist(img, markers=False, ax_size=[600, 400], line_width=2)
_images/hist_51_0.png

Alert! What just happened? If you were paying close attention you noticed we did not pass a DataFrame to the hist plot. This is a sneaky, under-the-table trick that exists to make life easier when using hist or imshow with for 2D numpy.ndarrays. These arrays are converted into DataFrames behind the scenes so you don’t have to take an extra step. This can be our dirty little secret…

Now let’s multiplying our patch by a 2D Gaussian to approximate lens shading:


x, y = np.meshgrid(np.linspace(-1,1,300), np.linspace(-1,1,300))
dst = np.sqrt(x*x+y*y)
sigma = 1
muu = 0.001

gauss = np.exp(-( (dst-muu)**2 / ( 2.0 * sigma**2 ) ) )
img2 = (gauss * img).astype(np.uint16)
plt.imshow(img2, cmap='gray', vmin=0, vmax=2**10)

<matplotlib.image.AxesImage at 0x7ff610015e20>
_images/hist_54_1.png

fcp.hist(img2, markers=False, line_width=2)
_images/hist_55_0.png

RGB

Now lets mock-up a Bayer array for a light blue color patch and show how fivecentplots allows you to easily split the histogram into distinct color planes (based on a CFA pattern). Here we’ll assume “GRBG”:


img_rgb = np.zeros([300, 300]).astype(np.uint16)
img_rgb[::2, ::2] = 180  # green_red
img_rgb[1::2, 1::2] = 180  # green_blue
img_rgb[::2, 1::2] = 10
img_rgb[1::2, ::2] = 255
plt.imshow(img_rgb)

<matplotlib.image.AxesImage at 0x7ff610455f40>
_images/hist_58_1.png

Which upon demosaicing would give:


import colour_demosaicing
plt.imshow(colour_demosaicing.demosaicing_CFA_Bayer_bilinear(np.array(img_rgb), 'GRBG').astype(np.uint16))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

<matplotlib.image.AxesImage at 0x7ff64daf9760>
_images/hist_60_2.png

By default, the hist function pays no mind to any differences in image pixels due to CFA:


fcp.hist(img_rgb, markers=False, ax_scale='logy', ax_size=[600, 400], line_width=2, xmin=-5, xmax=260, ymin=1, ymax=60000)
_images/hist_62_0.png

However, we can specify a CFA When plotting the histogram using the keyword cfa="grbg" and then legend by color plane (for convenience, we also invoke the special color scheme shortcut fcp.BAYER to color the planes according to their filter color). Notice in this example that the “gr” and “gb” pixels overlap.


fcp.hist(img_rgb, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, xmin=-5, xmax=260,
         colors=fcp.BAYER)
_images/hist_64_0.png

Now lets add some shading and noise to make a more meaningful histogram. This results in the color patch below.


# Gaussian shading
x, y = np.meshgrid(np.linspace(-1,1,300), np.linspace(-1,1,300))
dst = np.sqrt(x*x+y*y)
sigma = 1
muu = 0.001
gauss = np.exp(-( (dst-muu)**2 / ( 2.0 * sigma**2 ) ) )
img_rgb2 = (gauss * img_rgb).astype(float)

# Random noise
img_rgb2[::2, ::2] += np.random.normal(-0.1*img_rgb2[::2, ::2].mean(), 0.1*img_rgb2[::2, ::2].mean(), img_rgb2[::2, ::2].shape)
img_rgb2[1::2, ::2] += np.random.normal(-0.1*img_rgb2[1::2, ::2].mean(), 0.1*img_rgb2[1::2, ::2].mean(), img_rgb2[1::2, ::2].shape)
img_rgb2[1::2, 1::2] += np.random.normal(-0.1*img_rgb2[1::2, 1::2].mean(), 0.1*img_rgb2[1::2, 1::2].mean(), img_rgb2[1::2, 1::2].shape)
img_rgb2[::2, 1::2] += np.random.normal(-0.1*img_rgb2[::2, 1::2].mean(), 0.1*img_rgb2[::2, 1::2].mean(), img_rgb2[::2, 1::2].shape)
img_rgb2 = img_rgb2.astype(np.uint16)
plt.imshow(colour_demosaicing.demosaicing_CFA_Bayer_bilinear(np.array(img_rgb2), 'GRBG').astype(np.uint16))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

<matplotlib.image.AxesImage at 0x7ff64c9d0820>
_images/hist_66_2.png

Again, invoking the cfa keyword with legending we get the following:


fcp.hist(img_rgb2, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.BAYER)
_images/hist_68_0.png

While this example is a bit contrived, it demonstrates the power of fivecentplots for raw image analysis in industries using image sensors or cameras.

PDF

fivecentplots histograms can be converted to probability density functions inline using the keyword pdf=True. Here we use the shaded color patch with noise from above for our input.


fcp.hist(img_rgb2, markers=False, ax_scale='logy', ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.BAYER, pdf=True)
_images/hist_72_0.png

CDF

fivecentplots histograms can also be converted to cumulative distribution functions inline using the keyword pdf=True. Again, we use the shaded color patch with noise from above for our input. With no color plane separation:


fcp.hist(img_rgb2, markers=False, ax_size=[600, 400], line_width=2, colors=fcp.BAYER, cdf=True)
_images/hist_75_0.png

With color plane separation:


fcp.hist(img_rgb2, markers=False, ax_size=[600, 400], legend='Plane', cfa='grbg', line_width=2, colors=fcp.BAYER, cdf=True)
_images/hist_77_0.png

Styles

Bar style

Colors

The bar edge and fill colors can be controlled by kwargs:


fcp.hist(df, x='Value', hist_edge_color='#555555', hist_edge_width=2, hist_fill_alpha=1, hist_fill_color='#FF0000')
_images/hist_82_0.png

Alignment

The alignment of the bars relative to the ticks on the x-axis can be adjusted. Options include: {‘left’, ‘mid’, ‘right’}


fcp.hist(df, x='Value', hist_edge_color='#555555', hist_edge_width=2, hist_fill_alpha=1, hist_fill_color='#FF0000',
         hist_align='left')
_images/hist_85_0.png

fcp.hist(df, x='Value', hist_edge_color='#555555', hist_edge_width=2, hist_fill_alpha=1, hist_fill_color='#FF0000',
         hist_align='mid')
_images/hist_86_0.png

Width

The relative width of the bars can be controlled by the keyword hist_rwidth:


fcp.hist(df, x='Value', hist_edge_color='#555555', hist_edge_width=2, hist_fill_alpha=1, hist_fill_color='#FF0000',
         hist_align='mid', hist_rwidth=0.3)
_images/hist_89_0.png

fcp.HIST

When plotting histograms from images, a helpful shortcut dictionary of useful keywords args can be utilized via fcp.HIST:


fcp.HIST

{'ax_scale': 'logy', 'markers': False, 'line_width': 2, 'preset': 'HIST'}

Drawing on the example above, this shortcut would be used as follows:


fcp.hist(img_rgb2, ax_size=[600, 400], **fcp.HIST)
_images/hist_94_0.png

Compared to without the shortcut:


fcp.hist(img_rgb2, ax_size=[600, 400])
_images/hist_96_0.png