Useful Python Snippets

by Boxplot Oct 26, 2019

The goal of this blog post is a compilation of little tidbits and code snippets that address common issues when programming for data analysis in Python.

General Snippets

Difference between JSON and XML

This page gives a great example of the difference between data in JSON format and XML format. It shows the exact same data in both formats: https://json.org/example.html

Converting scientific notation into numbers

Converting from scientific notation in a Pandas Dataframe: https://re-thought.com/how-to-suppress-scientific-notation-in-pandas/

Remove ellipses from pandas dataframe preview:

pd.options.display.max_columns = 2000

#If you don't want to make the change permanently for the notebook, 
(e.g., to avoid excessive output in other cells), you can also use 
pd.option_context:

with pd.option_context('display.max_columns', 2000):
     print(df.describe())
#temporarily display all columns
with pd.option_context('display.max_seq_items', None):
    print (df.columns)

Isolate date columns:

datecols2 = []

for item in prod.columns:
    if 'Date' in item:
        datecols2.append(item)
        
datecols2

Add grand total column to a pivot table:

test_df = pd.pivot_table(prod, index="Color", columns="Class", values="ListPrice", aggfunc=np.sum)
test_df['Grand Total'] = test_df.sum(axis=1)
test_df

Comparing group by syntax to pivot table syntax:

prod.groupby(['Class', 'Style']).count()[['Name']]
pd.pivot_table(prod, index=['Class', 'Style'], values="Name", aggfunc="count")

Use apply for multiple columns in a dataframe:

avo.apply(lambda row: row.AveragePrice * row['Total Volume'], axis=1)

Choose an argument for the open function (file i/o)

Install packages in Jupyter Notebook

# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install pytime

Understanding copying objects in python

These two links are excellent at explaining:
https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list
and
https://www.geeksforgeeks.org/copy-python-deep-copy-shallow-copy/

Reverse Dictionary Function

def reverse_dict(lookup_value):
    dictionary = {'george' : 16, 'amber' : 19}
    for key, value in dictionary.items():  
        if value == lookup_value:
            print(key)
            
reverse_dict(19)

What are args and kwargs?

https://pythontips.com/2013/08/04/args-and-kwargs-in-python-explained/

Removing duplicate rows in a dataframe

https://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.drop_duplicates.html

https://jamesrledoux.com/code/drop_duplicates

This is an extremely important pandas doc page! Indexing & slicing dataframes

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Selecting & looping through parts of a dataframe

https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

Connect to a mySQL database:

import mysql.connector
# Set up your connection to the database
myConnection = mysql.connector.connect( host=
[PUT YOUR HOST NAME HERE], user=[PUT YOUR USERNAME HERE], 
passwd=[PUT YOUR PASSWORD HERE], db=[PUT THE DATABASE NAME HERE] )

# Read the results of a SQL query into a pandas data frame.
my_table = pd.read_sql('SELECT * FROM table_name, con=myConnection)

Connect to postgres database:

import psycopg2
connection = psycopg2.connect(user = 
"your-username-here-keep-quotes",
password = "your-password-here-keep-quotes",
host = "your-host-here-keep-quotes",
port = "5432",
database = "your-database-here-keep-quotes")
cursor = connection.cursor()
cursor.execute("SELECT * FROM django_session;")
record = cursor.fetchone()
print(record)

Connect to the Twitter API

import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize
import twitter

# You need to replace all the capital words in brackets with your
# ACTUAL keys. The quotation marks stay but the brackets and 
# capital words must go. 
api = twitter.Api(consumer_key='[CONSUMER KEY GOES HERE]',
                  consumer_secret='[CONSUMER SECRET GOES HERE]',
                  access_token_key='[ACCESS TOKEN KEY GOES HERE]',
                  access_token_secret='[ACCESS TOKEN SECRET]')

# Get the tweet data since that last tweet
# The user_id is for Boxplot's timeline, replace it with your 
# own if you'd like!
user_timeline = api.GetUserTimeline(user_id='959273870023905280')
latest_twitter_data_final = pd.DataFrame()

for i in range(len(user_timeline)):
    rowasdf = \ json_normalize(json.loads(json.dumps(user_timeline[i]._json))) \
    latest_twitter_data_final = pd.concat([latest_twitter_data_final, \ 
rowasdf]).reset_index(drop=True)

latest_twitter_data_final

Loop through a Series and make sure that each subsequent value is greater than or equal to the one before it. If not, set the value equal to the one before it:

previous_value = 0

def previous(current):
   global previous_value
   if current < previous_value:    
       return_value = previous_value
#        previous_value = current
   else:
       return_value = current
   
   previous_value = return_value
   return return_value
choc['Rating'].head(10).apply(previous)

Remove white space in a column

df = pd.DataFrame({'a':[' app le ']})
print(len(df.a[0]))
df.a = df.a.str.strip()
len(df.a[0])

Customizing Matplotlib Visualizations

How to customize the range of the x-axis and rotate the tick marks:

awesome_table1 = pd.pivot_table(data, index='DEGFIELD3', 
columns='REGION2', values='CBSERIAL', aggfunc='count')
#len(list(awesome_table1.index))
awesome_table1.plot(figsize=(18,12));
plt.xticks(range(0,13),list(awesome_table1.index),rotation=-45)

Also see:
https://stackoverflow.com/questions/12608788/changing-the-tick-frequency-on-x-or-y-axis-in-matplotlib
and

https://stackoverflow.com/questions/27671748/how-to-print-y-axis-label-horizontally-in-a-matplotlib-pylab-chart

https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib

Distance between axes-label and axes in matplotlib

Create reusable settings for a chart:

def my_scatterplot(x_txt, y_txt, df, colorcol):
    df.plot(kind='scatter', x=x_txt, y=y_txt, c=colorcol, colormap='winter', figsize=(10,4), s=10, alpha=.5)
    
my_scatterplot('Total Bags', 'AveragePrice', avo, 'type_as_num')

Change the size of all charts in a notebook:

# put this at the top of the notebook:
plt.rcParams["figure.figsize"] = [15, 10]

Example of changing colors and marker types in scatterplots:

colors = ['b', 'c', 'y', 'm', 'r']

en = plt.scatter(books_data[books_data['language_code']=='en']
['average_rating'], books_data[books_data['language_code']=='en']
['ratings_count'], marker='x', color=colors[0])

spa = plt.scatter(books_data[books_data['language_code']=='spa']
['average_rating'], books_data[books_data['language_code']=='spa']
['ratings_count'], color=colors[2])

fre  = plt.scatter(books_data[books_data['language_code']=='fre']
['average_rating'], books_data[books_data['language_code']=='fre']
['ratings_count'], marker='o', color=colors[1])

# a  = plt.scatter(random(10), random(10), marker='o', 
color=colors[2])
# h  = plt.scatter(random(10), random(10), marker='o', 
color=colors[3])
# hh = plt.scatter(random(10), random(10), marker='o', 
color=colors[4])
# ho = plt.scatter(random(10), random(10), marker='x', 
color=colors[4])

plt.legend((en,spa,fre),
          ('English', 'Spanish', 'French'),
          scatterpoints=1,
          loc='lower left',
          ncol=3,
          fontsize=8)

plt.show()

Multiple y axes, and forced axis

# This is one data point we're trying to plot
chart1 = sets.groupby('year')['num_parts'].count() 
# This is the other data point we're trying to plot
chart2 = sets.groupby('year')['num_parts'].mean() 
fig, ax = plt.subplots(sharey='col')  

# Create a MatPlotLib figure & subplot
ax2 = ax.twinx() # ax2 shares X axis with the ax Axes object
#  This is what forces scale on the second Y-axis!!
ax2.set_ylim(bottom=0, top=799)  

# graph both of our data sets, one bar, one line
ax.bar(chart1.index, chart1, color='dodgerblue')
ax2.plot(chart2.index, chart2, color='red')

# Set the size of the resulting figure
fig.set_size_inches(12,8)

Other Visualizations

A great tutorial for mapping:

https://towardsdatascience.com/mapping-geograph-data-in-python-610a963d2d7f

Side by side boxplots with seaborn:

Set x and y axes for seaborn plots:

ax = sns.barplot(x = 'val', y = 'cat',
              data = fake,
              color = 'black')
ax.set(xlabel='common xlabel', ylabel='common ylabel')
plt.show()

Make a word cloud in the shape of a custom image:

# If using Jupyter Notebook, you need to install the 
# wordcloud module like this

import sys
 !{sys.executable} -m pip install wordcloud

# import the libraries needed

from PIL import Image
 import numpy as np
 import pandas as pd
 from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
 import matplotlib.pyplot as plt

#  import your dataset
prod = pd.read_csv('winemag-data-130k-v2.csv')

# import the image mask
wine_mask = np.array(Image.open("wine_mask.png"))

# generate the word cloud
comment_words = ''

stopwords = set(STOPWORDS)

stopwords.update(["drink", "now", "wine", "flavor", "flavors"])

for val in prod.description.iloc[0:1000]:

    val = str(val)

    tokens = val.split(' ')

for i in range(len(tokens)):
    tokens[i] = tokens[i].lower()
    #print(tokens[i])

for word in tokens:
    comment_words = comment_words + ' ' + word

wordcloud = WordCloud(background_color="floralwhite", 

                      max_words=1000,

                      mask=wine_mask,

                      stopwords=stopwords, 

                      contour_width=3,

                      contour_color='floralwhite').generate(comment_words)

plt.figure(figsize = (48,48), facecolor = None)

plt.imshow(wordcloud, interpolation="bilinear")

plt.axis('off')

plt.tight_layout(pad=0)

plt.title("Frequent Words from Tasters - Wine Form",fontsize = 40,color='gray')

plt.show()

Make a single box and whisker plot with Matplotlib:

fig, axs = plt.subplots(1, 1)
 axs.boxplot(prod['points'])
 axs.set_title('basic plot')
plt.show()

Multiple box and whisker plots with Matplotlib:

# To make side by side box and whisker plots (in this example, get 
# points for each country, and then make a list of those lists). 
# That is what is passed in to the boxplot function:

u = list(prod[prod['country']=='Italy']['points'])
m = list(prod[prod['country']=='Portugal']['points'])
w = list(prod[prod['country']=='Germany']['points'])
final_list = []
final_list.append(u)
final_list.append(m)
final_list.append(w)

fig7, ax7 = plt.subplots()
ax7.set_title('Multiple Samples with Different sizes')
ax7.boxplot(final_list);

Pie Chart:

prod.country_other.value_counts().plot(kind='pie',
   autopct='%1.0f%%', colors=['skyblue', 'lavender', 'lightpink',
   'lightcyan', 'lemonchiffon', 'mistyrose'])
plt.legend(title = 'Wine Country of Origin', loc='best', bbox_to_anchor=(1, 0, 0.5, 1))
plt.figure(figsize=(360, 250))

Bubble plot tutorial

https://python-graph-gallery.com/270-basic-bubble-plot/

Make a bubble plot in excel

Calculating correlation with a scatterplot

https://stackoverflow.com/questions/41635448/how-can-i-draw-scatter-trend-line-on-matplot-python-pandas/41635626

<< Previous Post

"A/B Testing Example (Two Proportion Hypothesis Test)"

Next Post >>

"Intro To SQL"

Useful Python Snippets

Useful Python Snippets

by Boxplot Oct 26, 2019

General Snippets

Difference between JSON and XML

Converting scientific notation into numbers

Remove ellipses from pandas dataframe preview:

Isolate date columns:

Add grand total column to a pivot table:

Comparing group by syntax to pivot table syntax:

Use apply for multiple columns in a dataframe:

Choose an argument for the open function (file i/o)

Install packages in Jupyter Notebook

Understanding copying objects in python

Reverse Dictionary Function

What are args and kwargs?

Removing duplicate rows in a dataframe

This is an extremely important pandas doc page! Indexing & slicing dataframes

Selecting & looping through parts of a dataframe

Connect to a mySQL database:

Connect to postgres database:

Connect to the Twitter API

Loop through a Series and make sure that each subsequent value is greater than or equal to the one before it. If not, set the value equal to the one before it:

Remove white space in a column

Customizing Matplotlib Visualizations

How to customize the range of the x-axis and rotate the tick marks:

Create reusable settings for a chart:

Change the size of all charts in a notebook:

Example of changing colors and marker types in scatterplots:

Multiple y axes, and forced axis

Other Visualizations

A great tutorial for mapping:

Side by side boxplots with seaborn:

Set x and y axes for seaborn plots:

Make a word cloud in the shape of a custom image:

Make a single box and whisker plot with Matplotlib:

Multiple box and whisker plots with Matplotlib:

Pie Chart:

Bubble plot tutorial

Calculating correlation with a scatterplot

<< Previous Post

Next Post >>

Need help applying these concepts to your organization's data?

Chat with us about options.

Schedule a Meeting

Continue to make data-driven decisions.