Exploratory data analysis on NSW Roads Fines

Data Exploration of Australia NSW Fines dataset


The dataset describes the fines issued in Australia NSW between January 2012 and November 2017. The dataset is available to download from Dept of NSW Revenu

The exploration is set out to understand the insights that the overall severity of fines. And hopefully, we are able to identify the likelihood of kind of offences for each category of drivers.

Specifically, we are going to answer following questions:

# invite people for the science party
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
from matplotlib.dates import MonthLocator, DateFormatter
from scipy import stats
import matplotlib.pyplot as plt
from wordcloud import WordCloud
pd.set_option('max_column', 100)
pd.set_option('max_row', 250)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

Specify the correct data type for categorical and date columns so it consumes much less memory and enables correct ordering on date column

df = pd.read_csv(
    'penalty_data_set_2.csv', 
    dtype={
        'CAMERA_IND': 'category',
        'SCHOOL_ZONE_IND': 'category',
        'SPEED_IND': 'category',
        'POINT_TO_POINT_IND': 'category',
        'RED_LIGHT_CAMERA_IND': 'category',
        'SPEED_CAMERA_IND': 'category',
        'SEATBELT_IND': 'category',
        'MOBILE_PHONE_IND': 'category',
        'PARKING_IND': 'category',
        'CINS_IND': 'category',
        'FOOD_IND': 'category',
        'BICYCLE_TOY_ETC_IND': 'category',
    }, parse_dates=['OFFENCE_MONTH'], date_parser=lambda x: pd.datetime.strptime(x, '%d/%m/%Y'))

Show what columns are present in the dataset

df.columns
Index(['OFFENCE_FINYEAR', 'OFFENCE_MONTH', 'OFFENCE_CODE', 'OFFENCE_DESC',
       'LEGISLATION', 'SECTION_CLAUSE', 'FACE_VALUE', 'CAMERA_IND',
       'CAMERA_TYPE', 'LOCATION_CODE', 'LOCATION_DETAILS', 'SCHOOL_ZONE_IND',
       'SPEED_BAND', 'SPEED_IND', 'POINT_TO_POINT_IND', 'RED_LIGHT_CAMERA_IND',
       'SPEED_CAMERA_IND', 'SEATBELT_IND', 'MOBILE_PHONE_IND', 'PARKING_IND',
       'CINS_IND', 'FOOD_IND', 'BICYCLE_TOY_ETC_IND', 'TOTAL_NUMBER',
       'TOTAL_VALUE'],
      dtype='object')

What each column means

|Columns              | Type    | Description                                                                 |
|---------------------|---------|-----------------------------------------------------------------------------|
|OFFENCE_FINYEAR      | String  | The financial year of the penalty notice                                    |
|OFFENCE_MONTH        | Date    | The first day of the month of the penalty notice                            |
|OFFENCE_CODE         | Integer | A unique identifier for the offence                                         |
|OFFENCE_DESC         | String  | The description of the offence                                              |
|LEGISLATION          | String  | The legislation creating the offence                                        |
|SECTION_CLAUSE       | String  | The relevant section or clause of the legislation                           |
|FACE_VALUE           | Number  | The penalty amount for the offence ($)                                      |
|CAMERA_IND           | String  | An indicator for 'Camera detected' offences                                 |
|CAMERA_TYPE          | String  | The type of camera                                                          |
|LOCATION_CODE        | String  | A unique identifier for the location of a camera                            |
|LOCATION_DETAILS     | String  | The street and suburb of a camera, and the direction in which it operates   |
|SCHOOL_ZONE_IND      | String  | An indicator for 'School Zone' offences                                     |
|SPEED_BAND           | String  | The speed range (also indicates red light offences)                         |
|SPEED_IND            | String  | An indicator for 'Speeding' offences                                        |
|POINT_TO_POINT_IND   | String  | An indicator for 'Point to Point Camera' offences                           |
|RED_LIGHT_CAMERA_IND | String  | An indicator for 'Red Light Camera' offences                                |
|SPEED_CAMERA_IND     | String  | An indicator for 'Speed Camera' offences                                    |
|SEATBELT_IND         | String  | An indicator for 'Seatbelt' offences                                        |
|MOBILE_PHONE_IND     | String  | An indicator for 'Mobile Phone' offences                                    |
|PARKING_IND          | String  | An indicator for 'Parking' offences                                         |
|CINS_IND             | String  | An indicator for 'Criminal Infringement Notice Scheme' offences             |
|FOOD_IND             | String  | An indicator for 'Food Safety' offences                                     |
|BICYCLE_TOY_ETC_IND  | String  | An indicator for 'Bicycle, Wheeled Toy and other Non-Motor Vehicle' offences|
|TOTAL_NUMBER         | Integer | The total number of penalty notices issued                                  |
|TOTAL_VALUE          | Number  | The total face value of penalty notices issued                              |
print('The dataset covers the records from {0} to {1}'.format(
    df['OFFENCE_MONTH'].min(), df['OFFENCE_MONTH'].max()
))
The dataset covers the records from 2012-01-01 00:00:00 to 2017-11-01 00:00:00
print('There are total {0} unique offence codes in the dataset'.format(len(df['OFFENCE_CODE'].unique())))
There are total 6396 unique offence codes in the dataset

Are we being fined more often than before?

The line chart shows the evolution on number of fines that had been issued for the time range. It shows there has been a climb in the number of fines issued from April 2013 (around 3000 fines monthly), peaked at 4200 fines in October 2014. And it maintained a average of 4000 fines from the beginning of 2015.

fig, ax = plt.subplots(1, figsize=(22, 6), )
monthly_fine_dist = df.groupby('OFFENCE_MONTH').agg({'TOTAL_VALUE': 'count', }).reset_index()
plt.plot('OFFENCE_MONTH', 'TOTAL_VALUE', data=monthly_fine_dist, linewidth=3)
months = MonthLocator(range(1, 13), bymonthday=1, interval=3)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsFmt)
_ = plt.xticks(rotation=45)
_ = plt.tick_params(axis='both', which='major', labelsize=16)
_ = plt.title('Monthly number of fines issued', fontweight='bold', size=21)

png

What offenses are most likely to happen?

It looks like 9 out of 10 offences are about speeding. Proceeded while on red lights ranked the 6th.

offence_code_lookup = df[['OFFENCE_CODE', 'OFFENCE_DESC']].drop_duplicates()
most_common_fines = df.groupby('OFFENCE_CODE').agg({'OFFENCE_MONTH': 'count'}).sort_values(by='OFFENCE_MONTH', ascending=False)[:10].reset_index()
most_common_fines = most_common_fines.merge(offence_code_lookup, how='left' )
most_common_fines
OFFENCE_CODE OFFENCE_MONTH OFFENCE_DESC
0 82950 20398 Motor vehicle exceed speed limit - over 10 km/...
1 83063 20042 Motor vehicle exceed speed limit - 10 km/h and...
2 74703 14359 Exceed speed limit over 10km/h - Camera Detected
3 74701 13912 Exceed speed limit 10km/h and under - Camera D...
4 82995 8600 Motor vehicle exceed speed limit - over 20 km/...
5 74705 6498 Exceed speed limit over 20km/h - Camera Detected
6 83406 6227 Proceed through red traffic light - Camera rec...
7 33063 4621 Motor vehicle exceed speed limit - 10 km/h and...
8 83491 4254 Proceed through red traffic arrow - Camera rec...
9 74731 4232 Proceed through red traffic light - Camera Det...
wordcloud = WordCloud(max_font_size=60, width=800, height=400).generate(' '.join(df['OFFENCE_DESC'].tolist()))
plt.figure(figsize=(20,10))
plt.imshow(wordcloud, interpolation='bilinear')
_ = plt.axis("off")

png

The offense patterns

The offense patten chart below shows that the substantial increase in the number of fines were because of the jump from speeding cameras! They must turned on lots of them!

We also observed that there is a growing trend in the number of fines from red light camera near the ending of the dataset and they seem in a linear relationship to the numbers of offences in school zones.

fine_category_fields = [
    'SCHOOL_ZONE_IND',
    'SPEED_IND',
    'POINT_TO_POINT_IND',
    'RED_LIGHT_CAMERA_IND',
    'SPEED_CAMERA_IND',
    'SEATBELT_IND',
    'MOBILE_PHONE_IND',
    'PARKING_IND',
    'CINS_IND',
    'FOOD_IND',
    'BICYCLE_TOY_ETC_IND'
]
_offence_trend = {}
for i, _field in enumerate(fine_category_fields):
    _offence_trend[_field] = df[df[_field] == 'Y'].groupby('OFFENCE_MONTH').agg({'FACE_VALUE': 'count'}).reset_index().sort_values('OFFENCE_MONTH').iloc[:,-1].tolist()

df_trend = pd.DataFrame(_offence_trend)
df_trend['months'] = df['OFFENCE_MONTH'].map(lambda x: x.strftime("%b '%y")).unique()


fig, ax = plt.subplots(1, figsize=(20, 12))
for column in df_trend.drop('months', axis=1):
    plt.plot(df_trend[column], marker='',  linewidth=2, label=column)


plt.legend(loc=1, ncol=1)
plt.title("Number of offences for each category", fontsize=18, fontweight='bold')
plt.xlabel("Months", size=14)
plt.ylabel("Number of fines", size=14)
_ = ax.set_xticklabels(df_trend['months'])

png

Severity of speeding

Since speeding count towards a major portion of offences, it would be good to see how much faster are drivers are driving than allowed speed

fig, ax = plt.subplots(1, figsize=(22, 6), )
_speed_bands = df[df['SPEED_BAND'].notnull()]['SPEED_BAND']
sns.countplot(_speed_bands)
ax.set_title('Different speeding categories', size=18, fontweight='bold')
plt.xlabel("Speeding subcategories", size=14)
plt.ylabel("Number of fines", size=14)
Text(0,0.5,'Number of fines')

png

The fine amount for each category of fines?

Most fines are of below AU$ 500, and of course there are a lot outliers that have been fined above AU$ 2000

fine_cat_to_id = {}
for i, _field in enumerate(fine_category_fields):
    fine_cat_to_id[i] = _field
    df[_field + '_CODES'] = df[_field].cat.codes

fine_category_code_fields = list(map(lambda x: x+ '_CODES', fine_category_fields))
df['FINE_CATEGORY_ID'] = df[fine_category_code_fields].apply(lambda x: x.argmax(), raw=True, axis=1)
df['FINE_CATEGORY_TEXT'] = df['FINE_CATEGORY_ID'].apply(lambda x: fine_cat_to_id[x])
fig, ax = plt.subplots(1, figsize=(12, 5))
sns.boxplot(x="FINE_CATEGORY_ID", y="FACE_VALUE", data=df, ax=ax, showfliers=False)
_ = ax.set_xticklabels(fine_category_fields, rotation=30)

png

Which location are easy to cause offence

At the time of this analysis was written, the camera location data published does not have the camera code, so it cannot be displayed on a map.