Fixing various typos

edcarp · Mar 18, 2024 · 4daa271 · 4daa271
1 parent 47d85cd
commit 4daa271
Show file tree

Hide file tree

Showing 3 changed files with 40 additions and 17 deletions.
diff --git a/_episodes/03-starting-with-data.md b/_episodes/03-starting-with-data.md
@@ -307,14 +307,12 @@ Let's look at the data using these.
 > 2. `waves_df.shape` Take note of the output of `shape` - what format does it
 >    return the shape of the DataFrame in?
 >    HINT: [More on tuples here][python-datastructures]
->
 > 3. `waves_df.head()` Also, what does `waves_df.head(15)` do?
 > 4. `waves_df.tail()`
 >
 > > ## Solution
 > >
-> > 
-> >
+> > 1. 
 > > ~~~
 > > Index(['record_id', 'buoy_id', 'Name', 'Date', 'Tz', 'Peak Direction', 'Tpeak',
 > >    'Wave Height', 'Temperature', 'Spread', 'Operations', 'Seastate',
@@ -323,7 +321,7 @@ Let's look at the data using these.
 > > ~~~
 > > {: .output}
 > >
-> > 
+> > 2.
 > >
 > > ~~~
 > > (2073, 13)
@@ -332,7 +330,7 @@ Let's look at the data using these.
 > >
 > > It is a _tuple_
 > >
-> > 
+> > 3.
 > >
 > > ~~~
 > >   record_id  buoy_id  ... Seastate Quadrant
@@ -346,6 +344,8 @@ Let's look at the data using these.
 > > {: .output}
 > >
 > > So, `waves_df.head()` returns the first 5 rows of the `waves_df` dataframe. (Your Jupyter Notebook might show all columns). `waves_df.head(15)` returns the first 15 rows; i.e. the _default_ value (recall the functions lesson) is 5, but we can change this via an argument to the function
+> >
+> > 4.
 > > 
 > > ~~~
 > >       record_id  buoy_id              Name  ... Operations  Seastate  Quadrant
@@ -418,7 +418,9 @@ array(['SW Isles of Scilly WaveNet Site', 'Hayling Island Waverider',
 >    in this case, the result is the same but when might be the difference be important?
 > 
 > > ## Solution
-> > 1.  
+> > 
+> > 1.
+> >
 > > ~~~
 > > buoy_ids = pd.unique(waves_df["buoy_id"])
 > > print(buoy_ids)
@@ -430,7 +432,7 @@ array(['SW Isles of Scilly WaveNet Site', 'Hayling Island Waverider',
 > > ~~~
 > > {: .output}
 > > 
-> > 2.  
+> > 2.
 > > 
 > > We could count the number of elements of the list, or we might think about using either the `len()` or `nunique()` functions, and we get 10.
 > >

diff --git a/_episodes/05-index-slice-subset.md b/_episodes/05-index-slice-subset.md
@@ -182,8 +182,9 @@ a = [1, 2, 3, 4, 5]
 >> 3. The error is raised because the list a has no element with index 5: it has only five entries, indexed from 0 to 4.
 >> 4. `a[len(a)]` also raises an IndexError. `len(a)` returns 5, making `a[len(a)]` equivalent to `a[5]`.
 >>     To retreive the final element of a list, use the index -1, e.g.
+>> 
 >> ~~~
->> a[-5]
+>> a[-1]
 >> ~~~
 >> {: .language-python}
 >>
@@ -336,6 +337,8 @@ using either label or integer-based indexing.
   they are interpreted as a *label*.
 - `iloc` is primarily *integer* based indexing
 
+Our dataset has **labels** for columns, but **indexes** for rows.
+
 To select a subset of rows **and** columns from our DataFrame, we can use the
 `iloc` method. For example, for the first 3 rows, we can select record_id, name, and date (columns 0, 2,
 and 3 when we start counting at 0), like this:
@@ -376,7 +379,8 @@ waves_df.loc[[0, 10, 35549], :]
 {: .language-python}
 
 **NOTE 1**: with our dataset, we are using integers even when using `loc` because our DataFrame index
-(which is the unnamed first column) is composed of integers - but Pandas converts these to strings
+(which is the unnamed first column) is composed of integers - but Pandas converts these to strings. If you had a column of
+strings that you wanted to index using labels, you need to convert that columun using the `set_index` function
 
 **NOTE 2**: Labels must be found in the DataFrame or you will get a `KeyError`.
 
@@ -412,20 +416,35 @@ gives the **output**
 Remember that Python indexing begins at 0. So, the index location [2, 6]
 selects the element that is 3 rows down and 7 columns over (Tpeak) in the DataFrame.
 
-It is worth noting that rows are selected when using `loc` with a single list of
-labels (or `iloc` with a single list of integers). However, unlike `loc` or `iloc`,
-indexing a data frame directly with labels will select columns (e.g. 
+It is worth noting that:
+
+ - using `loc` with a single list of labels (if the rows are labelled) returns rows
+ - using `iloc` with a single list of integers also returns rows
+
+ _but_
+
+-  indexing a data frame directly with labels will select columns (e.g. 
 `waves_df[['buoy_id', 'Name', 'Temperature']]`), while ranges of integers will
-select rows (e.g. waves_df[0:13]) - but passing a single integer will raise an error.
-Direct indexing of rows is redundant with using `iloc`, and will raise a `KeyError` if a single integer or list is used:
+select rows (e.g. waves_df[0:13])
+
+Passing a single integer when trying to index a dataframe will raise an error.
+
+Similarly, direct indexing of rows is redundant with using `loc`, and will raise a `KeyError` if a single integer or list is used:
 
 ~~~
 # produces an error - even though you might think it looks sensible
 waves_df.loc[1:10,1]
+
+# instead, use this:
+waves_df.loc[1:10, "buoy_id"]
+
+# or
+waves_df.iloc[1:10, 1]
 ~~~
 {: .language-python}
 
 
+
 the error will also occur if index labels are used without `loc` (or column labels used
 with it).
 A useful rule of thumb is the following: 
@@ -456,8 +475,10 @@ arrays)
 >
 >> ## Solution
 >>
->> 
+>> 1.
+>>
 >>   - `waves_df[0:3]` returns the first three rows of the DataFrame:
+>>
 >> ~~~
 >>    record_id  buoy_id                             Name              Date   Tz  ...  Temperature  Spread  Operations  Seastate  Quadrant
 >> 0          1       14  SW Isles of Scilly WaveNet Site  17/04/2023 00:00  7.2  ...         10.8    26.0        crew     swell      west
@@ -489,7 +510,7 @@ arrays)
 >>  `waves_df.iloc[0:4, 1:4]` selects specified columns of the first four rows
 >>  `waves_df.loc[0:4, 1:4]` results in a 'TypeError' - see below.
 >>
->> While iloc uses integers as indices and slices accordingly, loc works with labels. It is like accessing values from a dictionary, asking for the key names. Column names 1:4 do not exist, so the call to `loc` above results in an error. Check also the difference between `waves_df.loc[0:4]` and `waves_df.iloc[0:4]`.
+>> While `iloc` uses integers as indices and slices accordingly, `loc` works with labels. It is like accessing values from a dictionary, asking for the key names. Column names 1:4 do not exist, so the call to `loc` above results in an error. Check also the difference between `waves_df.loc[0:4]` and `waves_df.iloc[0:4]`.
 > {: .solution}
 {: .challenge}
 

diff --git a/_episodes/08-geopandas.md b/_episodes/08-geopandas.md
@@ -211,8 +211,8 @@ scotland.overlaps(cairngorms.iloc[0].geometry)
 >> # ...and get the names
 >> scotland.loc[overlaps].local_authority
 >> ~~~
->>
 >> {: .language-python}
+>>
 >> ~~~
 >> disjoints = scotland.disjoint(cairngorms.iloc[0].geometry)
 >> # get a Series of only the disjoints