Skip to content

Commit

Permalink
Fixing various typos
Browse files Browse the repository at this point in the history
  • Loading branch information
wood-chris committed Mar 18, 2024
1 parent 47d85cd commit 4daa271
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 17 deletions.
16 changes: 9 additions & 7 deletions _episodes/03-starting-with-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,14 +307,12 @@ Let's look at the data using these.
> 2. `waves_df.shape` Take note of the output of `shape` - what format does it
> return the shape of the DataFrame in?
> HINT: [More on tuples here][python-datastructures]
>
> 3. `waves_df.head()` Also, what does `waves_df.head(15)` do?
> 4. `waves_df.tail()`
>
> > ## Solution
> >
> >
> >
> > 1.
> > ~~~
> > Index(['record_id', 'buoy_id', 'Name', 'Date', 'Tz', 'Peak Direction', 'Tpeak',
> > 'Wave Height', 'Temperature', 'Spread', 'Operations', 'Seastate',
Expand All @@ -323,7 +321,7 @@ Let's look at the data using these.
> > ~~~
> > {: .output}
> >
> >
> > 2.
> >
> > ~~~
> > (2073, 13)
Expand All @@ -332,7 +330,7 @@ Let's look at the data using these.
> >
> > It is a _tuple_
> >
> >
> > 3.
> >
> > ~~~
> > record_id buoy_id ... Seastate Quadrant
Expand All @@ -346,6 +344,8 @@ Let's look at the data using these.
> > {: .output}
> >
> > So, `waves_df.head()` returns the first 5 rows of the `waves_df` dataframe. (Your Jupyter Notebook might show all columns). `waves_df.head(15)` returns the first 15 rows; i.e. the _default_ value (recall the functions lesson) is 5, but we can change this via an argument to the function
> >
> > 4.
> >
> > ~~~
> > record_id buoy_id Name ... Operations Seastate Quadrant
Expand Down Expand Up @@ -418,7 +418,9 @@ array(['SW Isles of Scilly WaveNet Site', 'Hayling Island Waverider',
> in this case, the result is the same but when might be the difference be important?
>
> > ## Solution
> > 1.
> >
> > 1.
> >
> > ~~~
> > buoy_ids = pd.unique(waves_df["buoy_id"])
> > print(buoy_ids)
Expand All @@ -430,7 +432,7 @@ array(['SW Isles of Scilly WaveNet Site', 'Hayling Island Waverider',
> > ~~~
> > {: .output}
> >
> > 2.
> > 2.
> >
> > We could count the number of elements of the list, or we might think about using either the `len()` or `nunique()` functions, and we get 10.
> >
Expand Down
39 changes: 30 additions & 9 deletions _episodes/05-index-slice-subset.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,9 @@ a = [1, 2, 3, 4, 5]
>> 3. The error is raised because the list a has no element with index 5: it has only five entries, indexed from 0 to 4.
>> 4. `a[len(a)]` also raises an IndexError. `len(a)` returns 5, making `a[len(a)]` equivalent to `a[5]`.
>> To retreive the final element of a list, use the index -1, e.g.
>>
>> ~~~
>> a[-5]
>> a[-1]
>> ~~~
>> {: .language-python}
>>
Expand Down Expand Up @@ -336,6 +337,8 @@ using either label or integer-based indexing.
they are interpreted as a *label*.
- `iloc` is primarily *integer* based indexing

Our dataset has **labels** for columns, but **indexes** for rows.

To select a subset of rows **and** columns from our DataFrame, we can use the
`iloc` method. For example, for the first 3 rows, we can select record_id, name, and date (columns 0, 2,
and 3 when we start counting at 0), like this:
Expand Down Expand Up @@ -376,7 +379,8 @@ waves_df.loc[[0, 10, 35549], :]
{: .language-python}

**NOTE 1**: with our dataset, we are using integers even when using `loc` because our DataFrame index
(which is the unnamed first column) is composed of integers - but Pandas converts these to strings
(which is the unnamed first column) is composed of integers - but Pandas converts these to strings. If you had a column of
strings that you wanted to index using labels, you need to convert that columun using the `set_index` function

**NOTE 2**: Labels must be found in the DataFrame or you will get a `KeyError`.

Expand Down Expand Up @@ -412,20 +416,35 @@ gives the **output**
Remember that Python indexing begins at 0. So, the index location [2, 6]
selects the element that is 3 rows down and 7 columns over (Tpeak) in the DataFrame.

It is worth noting that rows are selected when using `loc` with a single list of
labels (or `iloc` with a single list of integers). However, unlike `loc` or `iloc`,
indexing a data frame directly with labels will select columns (e.g.
It is worth noting that:

- using `loc` with a single list of labels (if the rows are labelled) returns rows
- using `iloc` with a single list of integers also returns rows

_but_

- indexing a data frame directly with labels will select columns (e.g.
`waves_df[['buoy_id', 'Name', 'Temperature']]`), while ranges of integers will
select rows (e.g. waves_df[0:13]) - but passing a single integer will raise an error.
Direct indexing of rows is redundant with using `iloc`, and will raise a `KeyError` if a single integer or list is used:
select rows (e.g. waves_df[0:13])

Passing a single integer when trying to index a dataframe will raise an error.

Similarly, direct indexing of rows is redundant with using `loc`, and will raise a `KeyError` if a single integer or list is used:

~~~
# produces an error - even though you might think it looks sensible
waves_df.loc[1:10,1]
# instead, use this:
waves_df.loc[1:10, "buoy_id"]
# or
waves_df.iloc[1:10, 1]
~~~
{: .language-python}



the error will also occur if index labels are used without `loc` (or column labels used
with it).
A useful rule of thumb is the following:
Expand Down Expand Up @@ -456,8 +475,10 @@ arrays)
>
>> ## Solution
>>
>>
>> 1.
>>
>> - `waves_df[0:3]` returns the first three rows of the DataFrame:
>>
>> ~~~
>> record_id buoy_id Name Date Tz ... Temperature Spread Operations Seastate Quadrant
>> 0 1 14 SW Isles of Scilly WaveNet Site 17/04/2023 00:00 7.2 ... 10.8 26.0 crew swell west
Expand Down Expand Up @@ -489,7 +510,7 @@ arrays)
>> `waves_df.iloc[0:4, 1:4]` selects specified columns of the first four rows
>> `waves_df.loc[0:4, 1:4]` results in a 'TypeError' - see below.
>>
>> While iloc uses integers as indices and slices accordingly, loc works with labels. It is like accessing values from a dictionary, asking for the key names. Column names 1:4 do not exist, so the call to `loc` above results in an error. Check also the difference between `waves_df.loc[0:4]` and `waves_df.iloc[0:4]`.
>> While `iloc` uses integers as indices and slices accordingly, `loc` works with labels. It is like accessing values from a dictionary, asking for the key names. Column names 1:4 do not exist, so the call to `loc` above results in an error. Check also the difference between `waves_df.loc[0:4]` and `waves_df.iloc[0:4]`.
> {: .solution}
{: .challenge}

Expand Down
2 changes: 1 addition & 1 deletion _episodes/08-geopandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,8 @@ scotland.overlaps(cairngorms.iloc[0].geometry)
>> # ...and get the names
>> scotland.loc[overlaps].local_authority
>> ~~~
>>
>> {: .language-python}
>>
>> ~~~
>> disjoints = scotland.disjoint(cairngorms.iloc[0].geometry)
>> # get a Series of only the disjoints
Expand Down

0 comments on commit 4daa271

Please sign in to comment.