plot

The plot function is the primary vehicle for visualizing x-y data. At a minimum, the plot function requires the following keywords:

  • df: a pandas DataFrame

  • x: the name of the DataFrame column containing the x-axis data

  • y: the name of the DataFrame column containing the y-axis data

Other optional keywords are described in Keyword Arguments.

Setup

Imports

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import fivecentplots as fcp
import pandas as pd
import numpy as np
import os, sys, pdb
osjoin = os.path.join
st = pdb.set_trace

Sample data

In [2]:
df = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_data.csv'))
df.head()
Out[2]:
Substrate Target Wavelength Boost Level Temperature [C] Die Voltage I Set I [A]
0 Si 450 0.2 25 (1,1) 0.0 0.0 0.0
1 Si 450 0.2 25 (1,1) 0.1 0.0 0.0
2 Si 450 0.2 25 (1,1) 0.2 0.0 0.0
3 Si 450 0.2 25 (1,1) 0.3 0.0 0.0
4 Si 450 0.2 25 (1,1) 0.4 0.0 0.0
In [3]:
ts = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_ts.csv'))

Set theme

(Only needs to be run once)

In [4]:
#fcp.set_theme('gray')
#fcp.set_theme('white')

Other

In [5]:
SHOW = False

XY Plots

Scatter

A simple XY plot with no lines or legend

In [6]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=False, inline=False,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')

Legend

Add a legend by the column “Die”

In [7]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_19_0.png

By default, the values in the legend are sorted (using so-called “natural” sorting via the natsort library, if present). To disable this and order the legend based on the order in which the group was found in the dataframe, add the keyword sort=False:

In [8]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', show=SHOW, sort=False,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_21_0.png

Log scale

Log-scaled axes can be enabled through the kwargs “ax_scale”. Valid options:

  • x-only: logx | semilogx

  • y-only: logy | semilogy

  • both: loglog | log

  • symlog: symlog (doesn’t work in matploblib 1.5.1)

  • logit: logit

In [9]:
fcp.plot(df, x='Voltage', y='I [A]', ax_scale='loglog', legend='Die', show=SHOW, xmin=0.9, xmax=2.1, grid_minor=True,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_24_0.png

Categorical labels

Categorical DataFrame columns can also be plotted on x and/or y axes:

In [10]:
fcp.plot(df, x='Die', y='I [A]', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Voltage==1.5')
_images/plot_27_0.png

Time series

Similarly, a DataFrame containing time series data can be plotted:

In [11]:
fcp.plot(ts, x='Date', y='Happiness Quotient', markers=False, ax_size=[1000, 250])
_images/plot_30_0.png

Secondary x|y plots

Shared x-axis (twin_x)

In [12]:
fcp.plot(df, x='Voltage', y=['Voltage', 'I [A]'], twin_x=True, show=SHOW, legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Die=="(-1,2)"')
_images/plot_33_0.png

Shared y-axis (twin_y)

In [13]:
fcp.plot(df, x=['Voltage', 'I [A]'], y='Voltage', legend='Die', twin_y=True, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25 & Die=="(-1,2)"')
_images/plot_35_0.png

Multiple x & y values

Instead of sharing (or twinning) one independent axis across a primary and secondary dependent axis, we can plot multiple columns of data on the same dependent axis. In this case, all dependent values share the same limits on the plot.

Multiple y only

In [14]:
fcp.plot(df, x='Voltage', y=['Boost Level', 'I [A]'], legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_39_0.png

Multiple x only

In [15]:
fcp.plot(df, x=['Boost Level', 'I [A]'], y='Voltage', legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_41_0.png

Both

In [16]:
fcp.plot(df, x=['Boost Level', 'I [A]'], y=['Voltage', 'Temperature [C]'], legend='Die', show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_43_0.png

Grid plots

Row plot

Create a unique plot for each unique value of the “row” column in the specified DataFrame. Plots are arranged row by row in a single column. Each row contains a label indicating the unique value of the “row” column.

In [17]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', row='Boost Level', show=SHOW, ax_size=[225, 225],
         filter='Substrate=="Si" & Target Wavelength==450 & Temperature [C]==25')
_images/plot_47_0.png

Column plot

Create a unique plot for each unique value of the “col” column in the specified DataFrame. Plots are arranged in a single row, column by column. Each column contains a label indicating the unique value of the “col” column.

In [18]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', col='Boost Level', show=SHOW, ax_size=[225, 225],
         filter='Substrate=="Si" & Target Wavelength==450 & Temperature [C]==25')
_images/plot_50_0.png

Row x column grid

Remove “Temperature [C]” from the DataFrame subset and make a grid of plots for each unique combination of “row” and “col” values. Both “row” and “col” labels are included.

In [19]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', col='Boost Level', row='Temperature [C]', show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450', label_rc_font_size=13)
_images/plot_53_0.png

Wrap plot

A wrap plot is an alternate view of the row x column plot. For wrap plots, the row and column labels are condensed into a single label above each plot window and the spacing between plots is eliminated (by default; can be overriden). The x and y sizes of the grid are determined by the square root of the total number of plot windows. x and y axes ranges are shared.

In [20]:
fcp.plot(df, x='Voltage', y='I [A]', legend='Die', wrap=['Temperature [C]', 'Boost Level'], show=SHOW,
         ax_size=[225, 225], filter='Substrate=="Si" & Target Wavelength==450', label_rc_font_size=13)
_images/plot_56_0.png

Other options

Horizontal and vertical lines

We can add horizontal and vertical reference lines to a plot using one or more of the following keywords: ax_hlines, ax_vlines, ax2_hlines, ax2_vlines where “hlines” are horizontal, “vlines” are vertical, “ax” lines are applied to the primary axis, and “ax2” lines are applied to a secondary axis, if it exists. The value of these keywords is:

  • a single float number
  • a list of float numbers
  • or a variable-length list of tuples (only first value is required):
    • item 1 (required) = x or y axis value of the line or the name of a DataFrame column from which the first entry in the column will be used
    • item 2: line color
    • item 3: line style
    • item 4: line width
    • item 5: line alpha
    • item 6: legend text (won’t appear in legend without this unless item 1 is a name of a DataFrame column)

Explicitly defined horizontal and vertical lines:

In [21]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW, legend=True,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         ax_hlines=[(0, '#FF0000', '--', 3, 1, 'Open', '#555555', 0.25), 1.2], ax_vlines=[0, (1, '#00FF00')])
_images/plot_61_0.png

Horizontal line using the first value of a DataFrame column:

In [22]:
df['Open'] = 0
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW, legend=True,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         ax_hlines=[('Open', '#FF0000', '--', 3, 1), 1.2], ax_vlines=[0, (1, '#00FF00')])
_images/plot_63_0.png

Curve fitting

We can add a polynomial line of fit of some degree to the plot with the keyword fit with an option to display the fit equation and R^2 value:

In [23]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         fit=4, fit_eqn=True, fit_rsq=True, fit_font_size=9)
_images/plot_66_0.png

We can disable the legend using legend=False:

In [24]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         fit=4, fit_eqn=True, fit_rsq=True, fit_font_size=9, legend=False)
_images/plot_68_0.png

We can also constrain the region of interest for the fit by supplying a start and stop value to either the fit_range_x or fit_range_y keywords:

In [25]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         fit=1, fit_eqn=True, fit_rsq=True, fit_font_size=9, fit_range_x=[1.3, 2])
_images/plot_70_0.png

We can also add a line of fit for each item in a legend:

In [26]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW, legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         fit=1, fit_range_x=[1.3, 2], fit_width=2, fit_style='--')
_images/plot_72_0.png

If we legend and group by the same value and select a single color for the fit line, the legend will collapse to show only one instance of “Fit”:

In [27]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW, wrap='Die', legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         fit=1, fit_range_x=[1.3, 2], fit_width=2, fit_color='#555555', ax_size=[250, 250])
_images/plot_74_0.png

Stat lines

Rather than drawing connecting lines between all points, you can draw a line through a representative statistic of the data, such as the “median” value of the points, using the stat keyword (any stats that can be applied to a pandas groupby object can be used). For cases where the x-values are not identical for all data sets, use the keyword stat_val and specify an alternative x-axis to use for the statistical calculation (the actual plotted x-axis will be whatever is specified for the x keyword).

First consider a plot of Voltage vs I [A]. Because the actual measured current values vary from the current set point value from measurement to measurement, the x-values for each data set are not identical. The stat line thus computes the median value for each x value where there may only be a single data point. The result is ugly and useful.

In [28]:
fcp.plot(df, x='I [A]', y='Voltage', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         stat='median')
_images/plot_77_0.png

Instead, add the column name for the column containing the set point value of I [A] to the keyword stat_val:

In [29]:
fcp.plot(df, x='I [A]', y='Voltage', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         stat='median', stat_val='I Set')
_images/plot_79_0.png

Stat lines also work with multiple DataFrame columns are plotted on a given axis. For example, consider the following with two values on the y-axis:

In [30]:
fcp.plot(df, x='Voltage', y=['Boost Level', 'I [A]'], show=SHOW, legend=True, stat='median',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25')
_images/plot_81_0.png

Confidence intervals

It is possible to display confidence intervals on a data set. By default these are shown as partially transparent filled regions around a curve. The actual confidence interval to display is set by the conf_int keyword. Allowed values are any interval between 0 and 1 or the special value range which plots a region from the min to the max value of the data at a given point. The example below shows a 95% confidence interval:

In [31]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', lines=False, show=SHOW,
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         conf_int=0.95)
_images/plot_84_0.png

Reference line

We can add a reference line to the plot using the keyword ref_line. The value of this keyword is: (1) an equation applied to an existing DataFrame column; or (2) a simple int or float. This value is used to create a new column in the DataFrame that is plotted vs the value of x

y=x reference

In [32]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', show=SHOW, legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         ref_line=df['Voltage'], ref_line_legend_text='y=x', xmin=0, ymin=0, xmax=1.6, ymax=1.6)
_images/plot_88_0.png

We can also add multiple reference lines to a single plot:

In [33]:
df['2*Voltage'] = 2*df['Voltage']
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', show=SHOW, legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         xmin=0, ymin=0, xmax=1.6, ymax=1.6,
         ref_line=['Voltage', '2*Voltage'], ref_line_legend_text=['y=x', 'y=2*x'], ref_line_style=['-', '--'], ref_line_color=[5,6])
_images/plot_90_0.png

More complex calculation

Now let’s use the fit equation calculated above and add the fit line as a reference line (notice that because we are not specifying an exisiting column in the DataFrame as the ref_line and we are not specifying ref_line_legend_text, the legend defaults to a generic label “Ref Line”):

In [34]:
fcp.plot(df, x='Voltage', y='I [A]', title='IV Data', show=SHOW, legend='Die',
         filter='Substrate=="Si" & Target Wavelength==450 & Boost Level==0.2 & Temperature [C]==25',
         ref_line=1.555*df['Voltage']**4-3.451*df['Voltage']**3+2.347*df['Voltage']**2-0.496*df['Voltage']+0.014)
_images/plot_93_0.png