Grouping data

A single pandas DataFrame can often be decomposed into smaller subsets of data that need to be visualized. fivecentplots contains multiple grouping levels to expose unique subsets of data:

  • legend: color lines and markers according to unique values in a DataFrame column

  • groups:

    • for xy plots: separates unique subsets so plot lines are not continuously looped back to the origin (useful for replicates of similar data)

    • for boxplots: groups boxes by unique values in one or more DataFrame columns

  • row | column: makes a grid of subplots for each unique value of the DataFrame column names specified for these keyword arguments within a single figure

  • wrap:

    • Option 1: similar to row and column grouping, wrap makes a grid of subplots for each unique value of the DataFrame column names specified for these keyword arguments within a single figure

    • Option 2: wrap by x or y to create a uniqe subplot for each column name listed

  • figure: makes a unique figure for each unique value of a DataFrame column

It is also possible to filter data inline sets via the keyword filter.

Setup

Imports

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import fivecentplots as fcp
import pandas as pd
import numpy as np
import os, sys, pdb
osjoin = os.path.join
db = pdb.set_trace

Sample data

In [2]:
df1 = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_data.csv'))
df2 = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_data_box.csv'))

Set theme

(Only needs to be run once)

In [3]:
#fcp.set_theme('gray')
#fcp.set_theme('white')

Other

In [4]:
SHOW = False

legend

The legend keyword can be a single DataFrame column name or a list of column names. The data set will then be grouped according to each unique value of the legend column and a separate plot for each value will be added to the figure. A different color and marker type will be used to display each value.

Single legend column

In our sample data set, we have repeats of the same current vs. voltage measurement at three different “Die” locations. By setting the legend keyword equal to the name of the DataFrame column containing the “Die” value, we can distinctly visualize the measurement for each site.

In [5]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/grouping_16_0.png

Multiple legend columns

fivecentplots also supports legending by multiple DataFrame columns. When a list of column names is passed to the legend keyword, a dummy column is created in the DataFrame that concatenates the values from each column listed in legend. This new column is used for legending.

In [6]:
fcp.plot(df1, x='Voltage', y='I [A]', legend=['Die', 'Substrate'], show=SHOW,
         filter='Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/grouping_19_0.png

Multiple x & y values

When plotting more than one DataFrame column on the y axis without a specific grouping column, a legend is also enabled to improve visualization of the data. The legend can be disabled by setting legend=False.

In [7]:
fcp.plot(df1, x='Voltage', y=['I [A]', 'Voltage'], lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2')
_images/grouping_22_0.png

Secondary x|y plots

Three options are available for legending are available for plots with a secondary axis: (1) no legend; (2) legend based on the values of the primary/secondary axes; or (3) legend based on another column.

No legend

In [8]:
fcp.plot(df1, x='Voltage', y=['Voltage', 'I [A]'], twin_x=True, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Die=="(-1,2)"')
_images/grouping_26_0.png

Axis legend

In [9]:
fcp.plot(df1, x='Voltage', y=['Voltage', 'I [A]'], twin_x=True, legend=True, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Die=="(-1,2)"')
_images/grouping_28_0.png

Legend by another column

In [10]:
fcp.plot(df1, x='Voltage', y=['Voltage', 'I [A]'], twin_x=True, legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Die=="(-1,2)"')
_images/grouping_30_0.png

Location

By default, fivecentplots places the legend in the upper right corner of a figure, outside of the plot axes as this is the only way to be certain the legend will never obscure the actual data. However, the legend can be placed inside the axes area by specifying the keyword legend_location with one of the following values (which align roughly matplotlib syntax):

  • ‘outside’ or 0
  • ‘upper right’ or 1
  • ‘upper left’ or 2
  • ‘lower left’ or 3
  • ‘lower right’ or 4
  • ‘right’ or 5
  • ‘center left’ or 6
  • ‘center right’ or 7
  • ‘lower center’ or 8
  • ‘upper center’ or 9
  • ‘center’ or 10
  • ‘below’ or 11

Note this feature is not currently supported with the bokeh plotting engine

Outside right

Placing the legend on the outside right of the plot ensures that the legend will overlap the plotted data. fivecentplots uses this as a convenient default (i.e., no legend_location is required). There is no need to mess with bbox_to_anchor arguments as would be required with a pure matplotlib solution.

In [11]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/grouping_35_0.png

Inside the plot

Locations 1 through 10 will place the legend within the plot window. While this can result in a more compact figure, there is a also a risk that the legend will obscure the data in the plot. Care should be exercised when placing the legend inside the plot.

In [12]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25', legend_location=2)
_images/grouping_38_0.png

Below the plot

Long legend values can by annoying to deal with. If the legend is placed outside the axis, the figure size will have a large width. If the legend is placed within the axis window, it can extend longer than the axis itself. fivecentplots provides the option of placing the legend “below” the plot:

In [13]:
df1.loc[df1.Die=='(1,1)', 'Long Legend'] = 'Sample #ABCDEFGHIJKLMNOPQRSTUVWXYZ'
df1.loc[df1.Die=='(2,-1)', 'Long Legend'] = 'Sample #RUNFORYOURLIFEWITHME'
df1.loc[df1.Die=='(-1,2)', 'Long Legend'] = 'Sample #THESKYISANEIGHBORHOOD!!!!!!!!!'
fcp.plot(df1, x='Voltage', y='I [A]', legend='Long Legend', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25', legend_location='below')
_images/grouping_41_0.png

Note on location

Relocating the legend is straightforward when dealing with a single plot axes, but the ideal situation is a bit murky when dealing with grid plots with legends (see examples below). By default, the legend will be placed in each subplot window. However, if the keyword nleg is included and set equal to 1, the legend will appear in the new location but only within the upper right subplot.

groups

The groups keyword requires a column name or list of column names from the DataFrame. These columns are used to identify subsets in the data and make sure they are properly visualize in the plot. Unlike legend, group values are not added to a legend on the plot.

xy plots

Some data sets contain multiple sets of similar data. Consider the following example where we plot all the data together and connect the points with lines. Notice how the line loops from the end of the data back to the begining for each “group” of the data. This occurs because we have provided no information about how to segment the data set.

In [14]:
fcp.plot(df1, x='Voltage', y='I [A]', show=SHOW, legend='Temperature [C]',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2')
_images/grouping_48_0.png

To handle cases like this, we can add the keyword groups and specify another DataFrame column name that indicates how the data are grouped (in this case by “Die”). Now we get distinct lines for each instance of the measurement data. The groups keyword can still be combined with a legend from a different DataFrame column.

In [15]:
fcp.plot(df1, x='Voltage', y='I [A]', groups='Die', legend='Temperature [C]', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2')
_images/grouping_50_0.png

groups also supports multiple column names. Here is the above plot without legending by “Temperature [C]”:

In [16]:
fcp.plot(df1, x='Voltage', y='I [A]', groups=['Die', 'Temperature [C]'], show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2')
_images/grouping_52_0.png

boxplots

Like x-y plots, the groups keyword is used to break the data set into subsets. However, for boxplots the group column names and values are actually displayed along the x-axis:

In [17]:
df_box = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_data_box.csv'))
fcp.boxplot(df=df_box, y='Value', groups=['Batch', 'Sample'], legend='Region', show=SHOW)
_images/grouping_55_0.png

row | col subplots

By unique values

To see a larger subset of the main DataFrame, we can make a grid of subplots based on the unique values within DataFrame columns other than the primary x and y columns we are using. In this case, we remove the “Temperature [C]” and “Boost Level” columns from the filter keyword and add them to the row and column commands, respectively. This gives a grid of plots where each plot represents the intersection of a unique Boost Level and Temperature.

In [18]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', col='Boost Level', row='Temperature [C]', show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450', label_rc_font_size=14)
_images/grouping_59_0.png

By x or y

Alternatively, we can use row vs column grids to compare different x or y values. For example, we can plot two y-columns (one per row) by adding the keyword "y" to the row parameter. Each column in this plot shares the same x-axis variable.

In [19]:
fcp.plot(df1, x='Voltage', y=['Voltage', 'I [A]'], legend='Die', col='Boost Level', row='y', show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450 & Temperature [C]==75',
         label_rc_font_size=14)
_images/grouping_62_0.png

We can make a similar plot with common row values and different x-axes by setting col='x':

In [20]:
fcp.plot(df1, x=['Voltage', 'I [A]'], y='Voltage', legend='Die', row='Boost Level', col='x', show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450 & Temperature [C]==75', label_rc_font_size=14)
_images/grouping_64_0.png

wrap subplots

By unique values

We can create a plot similar to the row/column plot above using the wrap keyword. The key differences are that spacing and ticks between subplots is removed by default (can be changed via keywords), axis sharing is forced (cannot override), and the row/column value labels are condensed to a single label above each subplot. To do this, remove the row and column keywords from the previous function call and pass the “Temperature [C]” and “Boost Level” column names as a list to the wrap keyword.

In [21]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', wrap=['Temperature [C]', 'Boost Level'], show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450')
_images/grouping_68_0.png

By default, wrap plots will be arranged in a square grid. If the number of subplots does not match a perfectly square grid, you will have an incomplete row of subplots in the grid. All tick and axes labels are handled appropriately for this case. If you want to override the default grid size, specify the keyword ncol which sets the number of columns. The number of rows will be automatically determined from the number of subplots.

In [22]:
fcp.plot(df1, x='Voltage', y='I [A]', legend='Die', wrap=['Temperature [C]', 'Boost Level'], show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450', ncol=2)
_images/grouping_70_0.png

By x and y

We can also “wrap” by column names instead of unique column values. In this case we list the columns to plot in the x or y keywords as usual but we add “x” or “y” to the wrap keyword. This will create a unique subplot in a grid for each wrap value. Unlike the case of wrapping column values, this plot will not have wrap labels and axis sharing can be overriden. As before, used ncol to override the default gridding. Unlike a row vs column plot by x and y, a wrap plot by x and y can be displayed without the extra grouping factor.

In [23]:
fcp.plot(df1, x='Voltage', y=['I Set', 'I [A]'], legend='Die', wrap='y', show=SHOW, groups=['Boost Level', 'Temperature [C]'],
         ax_size=[325, 325], filter='Substrate=="Si" & Target Wavelength==450')
_images/grouping_73_0.png

Or an alternatively styled version of this plot with tick and label sharing off:

In [24]:
fcp.plot(df1, x='Voltage', y=['I Set', 'I [A]'], legend='Die', wrap='y', show=SHOW, groups=['Boost Level', 'Temperature [C]'],
         ax_size=[525, 170], filter='Substrate=="Si" & Target Wavelength==450', ncol=1, ws_row=0,
         separate_labels=False, separate_ticks=False)
_images/grouping_75_0.png

figure plots

To add another dimension of grouping, fivecentplots supports grouping by figure. In this case, a separate figure (i.e., a separate png) is created for each unique value in the DataFrame column(s) listed in the fig_groups keyword. Here we will plot a single figure for each value of the “Die” column.

In [25]:
fcp.plot(df1, x='Voltage', y='I [A]', fig_groups='Die', wrap=['Temperature [C]', 'Boost Level'], show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450', show_filename=True,
         filename=r'C:\GitHub\fivecentplots\fivecentplots\tests\test_images\grouping.py\figure.png')
_images/grouping_78_0.png
_images/grouping_78_1.png
_images/grouping_78_2.png