Welcome to the CSC Q&A.
Get help and give help!
Write great code!
It is our choices... that show what we truly are, far more than our abilities.

Categories

+11 votes

Does anyone know how to convert the heights in Lab 3? For some reason in python mine still showed up as the date like it would in excel and I am not sure how to change the format.

asked in DATA360_Spring2019 by (8 points)

1 Answer

+8 votes

If it's showing up as a date in Python, that probably means that you opened it up in Excel and then RE-SAVED IT from Excel, which is a problem. (In my opinion, Excel is a dangerous tool for dealing with data sets. It can be helpful to let you view/explore small amounts of data, but you have to be careful because it could be mangling your data in unanticipated ways, so you probably shouldn't save changes!)

You may need to go back to an earlier copy of your raw CSV file that you scraped, or you may need to go back even further and re-scrape the athletic rosters again.

Now, once you get back to where Python is showing heights as "5-6", then you need to create a function in Python that does a little string processing.

I'll help get you started, which is a bit tricky because you need to avoid processing the NaN (null) values for all the missing data in the height data column.

def heightStringToInches(hText):
    if pd.notnull(hText):
         #split the string into feet and inches
         footText, inchesText = hText.split('-')
         # finally, convert both those strings into numbers 
         # and do the math, and **return** the correct result

Once you've defined that function, it's a simple matter of calling the apply method on the height column of your data set and providing heightStringToInches as the function that we want to apply (instead of the len function that we did earlier in the example involving varsity letters).

answered by (113 points)
+3

When I try to define the function and apply that to the new column, I got an error "Not enough values to unpack (expected 2, got 1)". How could I fix that?

+2

Two possible issues:

  1. If you replaced all the null values in this column with an empty string, then when you try to split the empty string on '-', you only get one string back, instead of two. To fix this, you can change your if statement to be:

     if len(hText) >= 3:
    
  2. If you didn't replace null values with empty strings, there's a different problem. It turns out there are a few values in the original dataset where the height field contains only '-' (just a single hyphen, with no numbers on either side).

The best way to fix this is probably to replace '-' with a null entry during your dataset cleanup phase, after combining the height columns. Something like this:

import numpy as np
datFrame['height'].replace('-', np.nan, inplace=True)
...