Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning data into choropleth maps with Python and Folium #604

Open
hawc2 opened this issue Mar 19, 2024 · 50 comments
Open

Turning data into choropleth maps with Python and Folium #604

hawc2 opened this issue Mar 19, 2024 · 50 comments

Comments

@hawc2
Copy link
Collaborator

hawc2 commented Mar 19, 2024

Programming Historian in English has received a proposal for an original lesson, 'Turning data into choropleth maps with Python and Folium,' by @adamlporter.

I have circulated this proposal for feedback within the English team. We have considered this proposal for:

  • Openness: we advocate for use of open source software, open programming languages and open datasets
  • Global access: we serve a readership working with different operating systems and varying computational resources
  • Multilingualism: we celebrate methodologies and tools that can be applied or adapted for use in multilingual research-contexts
  • Sustainability: we're committed to publishing learning resources that can remain useful beyond present-day graphical user interfaces and current software versions

We are pleased to have invited @adamlporter to develop this Proposal into a Submission under the guidance of @nabsiddiqui.

The Submission package should include:

  • Lesson text (written in Markdown)
  • Figures: images / plots / graphs (if using)
  • Data assets: codebooks, sample dataset (if using)

We ask @adamlporter to share their Submission package with our Publishing team by email, copying in the editors.

We've agreed a submission date of early April. We ask @adamlporter to contact us if they need to revise this deadline.

When the Submission package is received, our Publishing team will process the new lesson materials, and prepare a Preview of the initial draft. They will post a comment in this Issue to provide the locations of all key files, as well as a link to the Preview where contributors can read the lesson as the draft progresses.

If we have not received the Submission package by April, @nabsiddiqui will attempt to contact @adamlporter. If we do not receive any update, this Issue will be closed.

Our dedicated Ombudspersons are Ian Milligan (English), Silvia Gutiérrez De la Torre (español), Hélène Huet (français), and Luis Ferla (português) Please feel free to contact them at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudspersons will have no impact on the outcome of any peer review.

@charlottejmc
Copy link
Collaborator

charlottejmc commented May 10, 2024

Hello @nabsiddiqui and @adamlporter,

You can find the key files here:

You can review a preview of the lesson here:


While processing this new lesson, I encountered a couple of queries which I’d like to ask for your help with:

image

Could you please confirm whether these are needed for the lesson? If so, we can host them too.

  • I'm seeing two images with extremely long (and slightly obscure) links at lines 1275 and 1279 on the markdown file. Would you like to keep them in? I can save them locally, then upload and embed them using our liquid syntax instead. However, we are committed to principles of minimal compute, which includes ensuring our pages remain as light as possible. Since these images are not crucial to understanding the lesson (and illustrate a rather easy step!), I wonder whether you would mind if we simply removed them altogether.

Thank you! ✨

@anisa-hawes
Copy link
Contributor

anisa-hawes commented May 10, 2024

Thank you for setting this up, @charlottejmc!

--

Hello @adamlporter,

Thank you for your patience and collaboration. I apologise for my confusion about how the Colab notebook you created intersects with your lesson. As I mentioned in my email, we are relatively new to handling codebooks within our lessons. The key objective of the guideline notes I shared with you is to make sure learning actions are accessible to all (whatever development environment they choose to work in) and to make sure we are able to sustain the lesson (performing technical maintenance if it is needed) in the future. What I hadn't understood upon first quick-reading, but what I understand better now is that this lesson is specifically is Colab-based. For that reason, I'd like to suggest that we foreground this fact -- I'd be keen to hear @nabsiddiqui and @hawc2's thoughts, but I might suggest including Colab in the title of this lesson.

The general approach we've developed encourages authors to think of codebooks as a supporting 'asset' that aggregates all the code used in the lesson, allowing readers to run it directly in the cloud, no matter their local technical specifications (as you very neatly express in the lesson). In this case (of a lesson specifically centred on using Colab as a workspace for creating the chloropleth maps) I wonder if the accompanying codebook could be pared back completely so that it contains only the code (+ essential line comments) and does not replicate any of the lesson text. We suggest that headings and subheadings are kept in place and advise that this mirrors the lesson's heading structure to support readers' navigation. We've made a new copy of the codebook which is hosted within our Google workspace. (I'll send you an invitation to edit that copy now).

If we re-centre the lesson as read on our website (Markdown + inline code) in this way, some small copyediting adjustments will be needed throughout. For example, 'in the next cell' would be replaced with 'in the code block below'. Charlotte and I would be happy to help with this.

Alongside this, some small changes would also be necessary in the section titled Colab to clarify how you intend readers use the Colab notebook to facilitate the learning actions taught by the lesson. Something like: 'We have set up a Google Colab notebook for you to use as you work through this lesson [...]' at that point in the text, we'll share a link to the codebook (hosted on our organisational workspace).


One final idea which you may have thought about previously (and which I'm sure you will discuss with Nabeel as you develop the lesson together) is how Colab enables you to show the maps as interactive visualisation. If that is a feature you'd like to explain within the lesson, you could consider creating some .gif animations that express the interactivity. We can handle .gif files in exactly the same we handle static images within our liquid syntax. So if you want to replace one or two of the static images with animations, we'd be happy to swap the files for you.

--

A couple of other very small notes about the adjustments we made to the Markdown file during set-up:

  • I transformed the html tables you've created into Markdown tables according to our conventions: ad51031 I've checked each one, but I'd like to ask you to double-check that they render as you intended. We can adjust as needed.
  • We've added placeholder alt_text + captions for each of the figure images, which we can work together to populate as the images are finalised.
  • I think Charlotte's question (above) about the two images which render in the Markdown preceded by ![image.png](data:image [...] is probably caused by my confusion too... You mentioned that you inserted the images of Colab's file/save functions directly into your notebook, and that function code transferred to the Markdown. Did you intend that these two screenshots of the file menu would be included as figures in the lesson? We can slot them into the sequence if you want to include them. My sense is that a sentence to describe the action of opening the file menu could be just as effective, (and preferred in terms of accessibility).
    --

@adamlporter
Copy link
Collaborator

adamlporter commented May 12, 2024 via email

@adamlporter
Copy link
Collaborator

adamlporter commented May 12, 2024 via email

@charlottejmc
Copy link
Collaborator

charlottejmc commented May 15, 2024

Thanks very much @adamlporter,

I've uploaded the additional assets for this lesson here, including the Fatal Force documentation (README.md).

  • Because Fatal Force's dataset is continuously updated, I think it will be important to clarify within the lesson that data taken from the original GitHub repository will be different to the data in the .csv file, which I have captured and uploaded to the assets folder today.

  • Both the Fatal Force and US Census datasets are open source, but we'll still want to reference them accurately using endnotes. If you have a suggested citation for them please do reply in a comment, or we can work on it together.

  • I also just want to make sure you double check the code at line 268 now, as I suppose it might need a slight adjustment if readers are using the cb_2021_us_county_5m.zip file from our repo, rather than the online link.

We've also removed the two screenshots that illustrated downloading the files.

Thanks again,

Charlotte ✨

@anisa-hawes anisa-hawes moved this from 1 Submission to 2 Initial Edit in Active Lessons May 15, 2024
@anisa-hawes
Copy link
Contributor

anisa-hawes commented May 15, 2024

Thank you, @adamlporter (and @charlottejmc!)


Hello @nabsiddiqui,

You'll note that a few general thoughts and queries were raised during the Submission set-up, to which Adam has responded. Charlotte has noted a few practical details which we will be happy to collaborate on with Adam during Phase 3 Revision 1.

This Submission is ready for your Initial Edit! ✨


What's happening now?

Hello @adamlporter. Your lesson has been moved to the next phase of our workflow which is Phase 2: Initial Edit.

In this Phase, your editor Nabeel @nabsiddiqui will read your lesson, and provide some initial feedback. Nabeel will post feedback and suggestions as a comment in this Issue, so that you can revise your draft in the following Phase 3: Revision 1.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 1 <br> Submission
Who worked on this? : Publishing Assistant (@charlottejmc)
All  Phase 1 tasks completed? : Yes
Section Phase 2 <br> Initial Edit
Who's working on this? : Editor (@nabsiddiqui)
Expected completion date? : June 12
Section Phase 3 <br> Revision 1
Who's responsible? : Author (@adamlporter)
Expected timeframe? : ~30 days after feedback is received
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@nabsiddiqui
Copy link
Collaborator

Thank you, @adamlporter, @charlottejmc, @anisa-hawes. I plan to get this done sometime in the next week or two. I will let you know if I need anything in the mean time.

@nabsiddiqui
Copy link
Collaborator

Hello @adamlporter,

Thank you for this wonderful article. My initial edits and thoughts are below:

General Comments

Comments are based on the paragraph numbers on the draft, which can be seen here: https://programminghistorian.github.io/ph-submissions/en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.

Overall, I really enjoyed the article and think it is a good explanation of Folium. I felt the organization was well done. I've divided my comments into these general comments and line edits.

  1. I do not believe that in the section discussing the different ways to get a scale variable that it is necessary to talk about all three methods. You mention that there are issues with capping the scale. I would not include the scale information as you already discuss many of the issues that come with it. Instead, I would remove these methods and just use the log scale. You will then need to also remove the part about
  2. You should mention earlier on the issue with normalizing data for choropleth maps. It is fine to not include this in the main text, but there should be more information earlier on with some of the issues that come about not normalizing the data and stating that there is information in the Appendix for those interested in looking at it.
  3. When the map is displayed throughout the article, the "m" is easy to miss in the code. Rename this variable to something more descriptive so that it is easier to notice.
  4. Since the data changes, it is useful to create a prominent note early on that you may not get the results that are shown in the file. You could also "halt” the data at some point, upload it to the Programming Historian GitHub, and use that throughout the tutorial
  5. The LaTeX code is not working throughout the article. This is largely due to our renderer I believe. @anisa-hawes would know more. We may need to have images of the formula.

Line Edits

Introduction

Paragraph 1-The link to the Covid-19 infection/death rates leads to a page that requires you to login into ArcGIS; Combine with paragraph 2

Paragraph 3-The last sentence isn’t necessary here as it is explained in the rest of the tutorial

Paragraph 4-“How to” -> “Know how to”; “One issue” -> “Understand that a key issue”

Paragraph 6: Combine sentences such as: “To create the maps, we will use Folium, a Python library…”

Folium

Paragraph 8-"CSS and Javascript,” -> “CSS and Javascript — if you need some help, see Programming Historian’s article….

Paragraph 10-“advanced features:” -> “advanced features, such as…”

Colab

Paragraph 19-Paragraph is not needed as Paragraph 18 and 22 covers all this

Paragraph 24-“In the next cell” -> “In the first cell”; When I ran the code in Google Colab, it says that the requirements are already met. You may want to double check if its still the case that Colab requires this to be installed. Otherwise, you can remove this and the earlier wording about geopandas.

Fatal Force Data

Paragraph 29-Let people know that the sample data they get will likely be different than the one displayed in the lesson

County Geometry Data

Paragraph 47-The first sentence is cnofusing. Match what data in ff_df? Be specific.; They are combined in the” -> “The numbers are combined in the”

Preparing the Data

Paragraph 56-“Doesn’t have a” -> “Doesn’t have an”

Define the Question

Paragraph 67-“Now that we have (1) a DF with data….->”Now that we have a DF with data (ff_df) and a DF with county geometries (counties) that share a common field (FIPS), we are ready to draw a map.

Paragraph 71-“someone being killed by a police officer” -> '”a police officer killing someone”

Draw the Map

Paragaph 76-This paragraph is not needed. I feel the fact you are using Folium is already implied in the previous paragraph

Paragraph 78-Might want to put another sentence here that says you will explain the code more below and that the reader should just paste the code for now; The code that displays the map at the end is easy to miss. It may be easier to break it into its own code chunk.

Paragraph 80-“choroplet” -> “choropleth”

Paragraph 81-I would recommend getting rid of the note and the following code and bullet point. You already mentioned earlier that it is not necessary to have the columns be the same name.

The Problem of Uneven Distribution of Data

Paragraph 83-Remove second sentence as you mention it again in paragraph 86

Paragraph 96-Dollar signs are present at the end of this. I think this is to show LaTeX? I don’t believe this is rendering properly.

Paragraph 98-At the end of the paragraph, write the following “We will look at two different solutions: The

Solution 1: Fisher-Jenks algorithm

Paragraph 100-May need to get rid of the GeoPandas part if it turns out you don’t actually need this.

Paragraph 101-Again the “m” at the end is easy to miss. Make it a seperate code block

Paragraph 104-“but especially at the lower end of the scale…” -> “but the lower end of the scale is illegible.”

Paragraph 105-Delete this paragraph

How to add a scale-value column

Paragraph 109-“For this explanation, I will assume” -> “For this explanation, I will…”

Method 1: Use a Log Scale

Paragraph 126-“We need not” -> “We do not need”

Paragraph 133-For this, you can just write something to the effect of “For the purposes of this tutorial and its learning goals, you do not need to know the specifics of the following code. It simply replaces log values with original values.”; I would also remove the code until later since it seems that someone should paste it in before the part it is used in paragraph 135

Method 2: Cap the Scale Manually

I don’t believe that you need this section. It is too confusing with all the moving pieces. Using the first method is sufficient, so I would remove this and fix the wording earlier to reflect that you are just using the log-method.

Adding a Floating Information Box
If you remove Method 2, you would need to rework this section to not talk about the capped scale.

Paragraph 162-Combine with following paragraph

Paragraph 169-Replace with the following: “Let’s see how this code functions. First, we iterate over all rows in the ‘features’ part of the GeoJSON data.

Paragraph 183-See if this still makes sense if using only the log method.

Add a Title

Paragraph 189-“Here’s what the next cell does:” -> “Let’s look at how the code works before using it for our map.”

Normalizing Population Data

Paragraph 209: "Pandas allows me" -> "Pandas allows us"

@anisa-hawes anisa-hawes moved this from 2 Initial Edit to 3 Revision 1 in Active Lessons May 26, 2024
@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 3: Revision 1.

This phase is an opportunity for you to revise your draft in response to @nabsiddiqui's initial feedback.

I've sent you an invitation to join us as an Outside Collaborator here on GitHub. This will give you the 'write access' you'll need to edit your lesson directly.

We ask authors to work on their own files with direct commits: we prefer you don't fork our repo, or use the Pull Request system to edit in ph-submissions. You can make direct commits to your file here: /en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.md. Charlotte and I can help if you encounter any practical problems or have questions.

When you and Nabeel are both happy with the revised draft, we will move forward to Phase 4: Open Peer Review.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 2 <br> Initial Edit
Who worked on this? : Editor (@nabsiddiqui) 
All  Phase 2 tasks completed? : Yes
Section Phase 3 <br> Revision 1
Who's working on this? : Author (@adamlporter)  
Expected completion date? : June 27
Section Phase 4 <br> Open Peer Review
Who's responsible? : Reviewers (TBC) 
Expected timeframe? : ~60 days after request is accepted
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@adamlporter
Copy link
Collaborator

@nabsiddiqui, thanks for your many good suggestions and corrections. I have edited the document to reflect them:

  • the biggest change is getting rid of the appendix and moving the discussion of normalizing data up to the middle of the lesson
  • I've simplified the discussion creating the scale-value / cap by only describing the log-scale. I've collapsed the others into an bullet list of "other options."

I've pushed the edited version of the document to Github, so everyone can review it.

@charlottejmc, unfortunately these changes mean that the map images for this section (and some subsequent sections) may need to be re-generated and inserted into the article in the appropriate places. I'm happy to make the images / screen shots.

Because the maps that are generated are interactive, @charlottejmc made the excellent suggestion that I create some animated GIFs that show how the maps can be scrolled / zoomed over. This would be especially nice for the info-tip box feature. I've done a bit of Googling and think I can create several of these.

Where should I send them or how would I set them up in the article?

Also, I realized there is a second census dept datafile that should probably be copied into the assets folder. It can be found here: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv . It is referenced in the section of the lesson about normalizing data.

@anisa-hawes suggested that I set up a colab / jupyter notebook with just the code cells that could be accessed / run by users. I love this idea -- it could be run on Binder, so folks could execute it very easily without needing to download it to their own computers or to open it Colab.

Do you have suggestions about the sort of language I should use to help users follow from one file to the other? When I've finished this, to whom should I send it?

Thanks! Adam

@adamlporter
Copy link
Collaborator

I've created three GIFs to show how users can move around the map and zoom in.
I've also created four PNGs to replace images in the current document that no longer correspond with the revised text.
I've edited the MD file to show where these seven new images should either be inserted or replace existing images.

I've also created a Colab / Jupyter notebook that has all the code for the article.

All this material can be found in this Google Drive folder

@anisa-hawes
Copy link
Contributor

Dear @adamlporter,

Many thanks for your work on the revisions in response to @nabsiddiqui's feedback.

At the moment, we are providing codebooks rendered with nbviewer (for example: understanding-creating-word-embeddings.ipynb), which are openable in Colab for those readers who wish to work there, while remaining accessible for those who want to work in other cloud-based environments or locally instead. MyBinder is an interesting suggestion, and one which has also been recommended to me by someone else in our network. Thank you - I'll investigate our options.

We'll download your additional images and gifs, and adjust the figure sequence as required. We'll write you a note here to confirm when the preview is updated. Thank you for your patience.

Best,
Anisa

@charlottejmc
Copy link
Collaborator

charlottejmc commented Jun 7, 2024

Hello @adamlporter,

I've added the additional datafile co-est2019-alldata.csv to the lesson's assets folder, alongside the Colab notebook you've provided – and I added a short line in the lesson to point readers to it.

I've also reordered all the images in the markdown file to prepare to re-upload the updated set of .pngs and .gifs. Just a couple points:

  • Am I correct that you deleted the figures 07 and 08 from the initial draft?
  • Am I correct that the total number of .pngs and .gifs is now 14?
  • Unfortunately the 3 .gifs are currently above GitHub maximum file size, which is 25MB, so I cannot upload them to our repository. I've tried compressing them using an online tool, but this reduces the quality too much. Do you think you would be able to provide a new set of .gifs that are under 25MB? If not, we can try and find an alternative solution together.

Meanwhile, would you please be able to provide a caption and alt-text description for each image? You could edit the markdown file directly and insert them within the liquid syntax:
{% include figure.html filename="en-or-data-into-choropleth-maps-with-python-and-folium-XX.png" alt="Visual description of figure image" caption="Figure XX. Caption text to display" %}, or if you would rather write them in a comment below, I can add them in myself.

Thank you very much for your patience!

@charlottejmc
Copy link
Collaborator

Thank you @nabsiddiqui for your detailed comments.

Hi @adamlporter, you'll see that I have ticked off a number of changes from Nabeel's list, as I was able to implement them easily myself. I've left the more involved changes to you!

@anisa-hawes anisa-hawes moved this from 3 Revision 1 to 4 Open Peer Review in Active Lessons Jul 11, 2024
@anisa-hawes
Copy link
Contributor

anisa-hawes commented Jul 11, 2024

Thank you for reviewing Adam's revisions, @nabsiddiqui ✨ You'll see that @charlottejmc has taken care of checking off the typing errors and small copyedits you've noted above. Often it makes best sense not to worry about these details too much in the early stages because many words, sentences and paragraphs may change through revision and review. Generally, I think we can benefit from taking a 'one touch' approach to catch all copyedits, grammatical adjustments, and typographical tweaks as part of the Publishing Team's work in Phase 6 Sustainability + Accessibility.

--

Hello Adam @adamlporter,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 4: Open Peer Review.

This phase is an opportunity for you to hear feedback from peers in the community.

Nabeel @nabsiddiqui will invite two reviewers to read your lesson, test your code, and provide constructive feedback. In the spirit of openness, reviews will be posted as comments in this issue (unless you specifically request a closed review).

After both reviews, Nabeel will summarise the suggestions to clarify your priorities in Phase 5: Revision 2.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 3 <br> Revision 1
Who worked on this? : Author (@adamlporter)
All  Phase 3 tasks completed? : Yes
Section Phase 4 <br> Open Peer Review
Who's working on this? :  @rnelson2 + @fmvaldezg
Expected completion date? : October 19
Section Phase 5 <br> Revision 2
Who's responsible? : Author (@adamlporter )
Expected timeframe? : ~30 days after editor's summary
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@adamlporter
Copy link
Collaborator

adamlporter commented Jul 11, 2024

@nabsiddiqui -- many thanks for the detailed list of corrections.
@charlottejmc -- Thanks for fixing many of these!

I'm confused about what I need to do next -- @anisa-hawes has moved this into the peer-review queue. Should I move forward with making changes to the existing document or should I wait to get feedback from the reviewers? Either is fine with me -- just let me know what I should do.

Thanks,
Adam

@nabsiddiqui
Copy link
Collaborator

Hey @adamlporter,

I think we can move forward to the next step. I will reach out to reviewers.

@adamlporter
Copy link
Collaborator

@nabsiddiqui Thanks!

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Jul 12, 2024

Dear @adamlporter,

I apologise for the confusion I've caused by posting the Phase-change diagram too soon. While @nabsiddiqui begins the work of identifying and making initial contact with potential peer reviewers, there is time for you to work through any remaining suggested edits from the checklist above if you'd like to.

With thanks,
Anisa

@adamlporter
Copy link
Collaborator

@anisa-hawes Perfect -- I'll get to work on it. Thanks! Adam

@adamlporter
Copy link
Collaborator

@nabsiddiqui - quick question: do you have a tool that numbers the paragraphs?

I'm editing this in Visual Studio Code. Perhaps you have a better editor you could recommend? (one that has displays paragraph numbers?)

Thanks,
Adam

@adamlporter
Copy link
Collaborator

@nabsiddiqui @anisa-hawes @charlottejmc

I've made a lot of changes to the file. Since I've deleted the "other options" paragraphs (140-46), I've also deleted the sections about how to create a scale variable / lambda functions. This material is only needed if folks want to explore "other options." If all we want to do is to add a log10() column, numpy makes it very easy (map_df['MapScale'] = np.log10(map_df['count'])).

I've also re-worked the conclusion, to add some links to discussions of choropleth maps and map making / map-story telling more generally.

I will work on the I/We and passive voice after getting feedback from the reviewers.

Thanks!
Adam

@nabsiddiqui
Copy link
Collaborator

Open Peer Review

During Phases 2 and 3, I provided initial feedback on this lesson, then worked with @adamlporter to complete a first round of revisions.

In Phase 4 Open Peer Review, we invite feedback from others in our community.

Welcome Rob Nelson @rnelson2 and Felipe Valdez @fmvaldezg. By participating in this peer review process, you are contributing to the creation of a useful and sustainable technical resource for the whole community. Thank you.

Please read the lesson, test the code, and post your review as a comment in this issue by Saturday October 19, 2024. If you need an extension, please let me know.

Reviewer Guidelines:

A preview of the lesson:

--
Notes:

  • All participants in this discussion are advised to read and be guided by our shared Code of Conduct.
  • Members of the wider community may also choose to contribute reviews.
  • All participants must adhere to our anti-harassment policy:

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

Programming Historian in English is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or request clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our ombudsperson Dr Ian Milligan. Thank you for helping us to create a safe space.

@rnelson2
Copy link

rnelson2 commented Sep 6, 2024

@adamlporter (and @nabsiddiqui):

Great lesson.

I only have one thought that's even remotely critical. There's a lot of techniques that are covered in the lesson beyond using Folium to generate maps. Half of the lesson is about transforming and joining the data, which in most cases is probably more work than generating the map. I mention this because I do find myself wondering how easily someone who was relatively new to mapping who read this lesson could begin to produce choropleth maps using different data that they were interested in. That data would almost certainly require different kinds of manipulations and transformations that what's covered here, and I can easily envision that person getting stuck very quickly.

Having said that, that's the nature of the work. The lesson provides an example of manipulating a dataset and joining it with spatial data, and that's probably as much as it can do and people have to spend the hours learning through further research and trial and error how to manipulate their own data. As you say, the "lesson assumes some proficiency with Python." Just a thought that I offer in no way to detract from what's a very solid and effective lesson.

I do have a few suggestions, which I think are all quite minor.

  • Around paragraph 15, you might suggest that users open up two browser windows side by side or use a browser with split screen functionality to have the lesson and the Colab notebook open side-by-side as they're proceeding through the lesson. That's what I did, and presuming they have enough screen space on their monitor, I think it's a much more effective way of moving through the lesson than switching back and forth.
  • The code on paragraph 33 isn't in the notebook.
  • In paragraph 55, you might provide a brief explanation of what a CRS is, particularly as you use the acronym several times in the subsequent paragraph. It's obviously an important concept for anyone doing anything with spatial data.
  • In paragraph 68, you might explain a bit more about why .reset_index() convert map_df to a DF. It wasn't obvious to me at least, which might just be me as I don't do a lot of programming in Python or with pandas.
  • In paragraph 81, the parameters are in a different order than they were in paragraph 77. The ones in 77 are better for your explanation with data appearing before key_on. That way you can swap the explanations of those parameters so that you can mention the data (map_df) before referencing it when describing key_on.
  • I think something's missing from the first sentence of the last paragraph: "In short, choropleth maps may be a way of displaying data and informing readers about topics": "powerful way", "effective way"—something like that. Even with that, the "may be" is an unnecessary qualification at the end of the lesson. I'd suggest revising it to say something more direct about what choropleth maps do particularly effectively.

Stylistic

¶ 4: drop the "Know how to"s. Readers will be able to "Associate latitude/longitude ..." and "Normalize data ..."
¶ 73: You might change "The following code block initializes a map object" to "The following code defines a python function that initializes a map object." Very minor, obviously, but more precise.
¶ 122: I'd change var newvalue = Math.pow(10.0, value).toFixed(1).toString() to var newvalue = Math.round(Math.pow(10.0, value)).toString(). I found the scale with its decimal places a little weird and maybe even off-putting as these are lives lost that shouldn't be counted in fractions.

Typos

¶ 2 spacial
¶ 6 June, — drop the comma
¶ 10 unfamilar
¶ 43 came (instead of same).
¶ 71 passs
¶ 78 intersted
¶ 97 uneavenly
¶ 116 logrithm
¶ 119 ned
¶ 162 becuse
¶ 170 missing period at the end of the paragraph
¶ 194 I don't believe lead levels is hyphenated

@nabsiddiqui
Copy link
Collaborator

Thank you @rnelson2. This is a wonderful review. @adamlporter you can make some of the copyedit changes now. I would wait until @fmvaldezg responds for more substantial edits.

@adamlporter
Copy link
Collaborator

adamlporter commented Sep 13, 2024 via email

@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

It's great that you're already starting to think-through your next steps, considering the first reviewer's feedback!
Just a quick reminder that it is important to wait until both reviews are received before beginning to implement changes.

When both reviews are in, Nabeel @nabsiddiqui will summarise them to clarify priorities for Phase 5 revisions. Thank you for your patience.

@fmvaldezg
Copy link

Hi @adamlporter,

Great lesson. I followed the steps both in the Colab notebook and by copying and pasting the code into a jupyter notebook. I generally agree with @rnelson2 that most of the lesson focuses on transforming the data. It would be interesting to present those using the lesson with the option of jumping directly to generating the map with the previously transformed data or replaying the entire lesson.
That said, here are some recommendations and observations about the lesson:

¶ 14. Add some examples of shapely data formats.

¶ 15. State the need to have a google account to execute code in colab.

¶ 16. Typo on ‘modern’ web-browser.

¶ 43. Typo on ‘same’ column names.

¶ 49. Stating why the use of s join instead of merge or join is basic. Or, paragraphs 55-57 could be at the very beginning of this section so that all paragraphs related to the sjoin process are together. I suggest referencing to the University Consortium GIS Body of Knowledge entry on ‘Spatial Joins’

¶ 55. Include a brief explanation of the importance of defining a CRS.

¶ 63. I suggest renaming the title of this section to something like ‘Summarizing data by county’.

¶ 70. Format ‘folium.Map’ as code snipet.

¶ 130. The file located in github did not worked. The reading process is successful if the census file is used

pop_df = pd.read_csv('https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv', usecols = ['STATE','COUNTY','POPESTIMATE2019'],encoding = "ISO-8859-1")
pop_df.head()

Additional coments

I see a problem in the fact of using civilian deaths for mutliple years (2015-2024) and normalize it using population data for 2022 only. In my opinion, the rate should be calculated for one year only.

When trying to reproduce the process in a jupyter notebook, when summarizing the ff_df dataframe by column FIPS, the name of the new column does not defaults to count (as it does when executing the line in the Colab notebook) which results in an error when displaying the map using this column. I used map_df.rename(columns={0:'count'}, inplace=True) to rename the column and the error was solved.

In the Colab notebook, on line 5, I suggest changing the question to 'what percent of the records have latitude/longitude data?' since it is more aligned with the code result.
After paragraph 119, be sure to state that this section of the code should be run. If it is run before defining cp it will result in an error.

@anisa-hawes
Copy link
Contributor

Many thanks, Felipe @fmvaldezg!

--

Hello @adamlporter,

Now that both reviews are received, @nabsiddiqui will summarise the two reviews so that you have a clear path forwards for your Phase 5 revisions.

Thank you,
Anisa

@nabsiddiqui
Copy link
Collaborator

nabsiddiqui commented Oct 8, 2024

Thank you @felipelmc and @rnelson2 for the wonderful reviews, and thank you again, @adamlporter for the lesson.

Based on the comments, I believe that the following changes should be made to the lesson, and then we can move forward with publishing it. I have divided them into two sections: Felipe's Comments and Rob's Comments.

Rob Comments

In regards to Rob's comment about the lesson focusing on transforming the data, I believe one way you can address this is to write a few small sentences or paragraphs outlining how the focus on the data transformation process that you have outlined here is to simulate the importance of how data is likely to need to be transformed in a real-world scenario. As Rob mentioned, "that's the nature of the work."

  • Add a small sentence around paragraph 15 providing a tip that says opening two browser windows side by side might be helpful
  • Add the code in paragraph 33 to the notebook
  • Make sure to introduce what CRS is
  • Put parameters in paragraph 81 in the same order as paragraph 77
  • Remove "may be" to "powerful way or "effective way" in the first sentence of the last paragraph
  • Make the typographic and stylistic changes Rob has outlined

Felipe Comments

  • In paragraph 14, add some examples of shapely data formats
  • In paragraph 15, state that you will need a Google account to execute the code
  • Paragraph 55-57, move towards the beginning of the section near paragraph 49
  • Make any stylistic or typographic changes that Felipe has outlined that Rob has not already outlined
  • Paragraph 63, rename the title of the section to be more descriptive
  • Paragraph 70, format as code snippet
  • Paragraph 130, fix the reading process
  • Address the "additional comments" section that Felipe has made

@anisa-hawes anisa-hawes moved this from 4 Open Peer Review to 5 Revision 2 in Active Lessons Oct 9, 2024
@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter,

What's happening now?

Your lesson has been moved to the next phase of our workflow which is Phase 5: Revision 2.

This phase is an opportunity for you to revise your draft in response to the peer reviewers' feedback.

Nabeel @nabsiddiqui has summarised their suggestions, but feel free to ask questions if you are unsure.

Please make revisions via direct commits to your file: /en/drafts/originals/data-into-choropleth-maps-with-python-and-folium.md. @charlottejmc and I are here to help if you encounter any difficulties.

When you and Nabeel are both happy with the revised draft, the Managing Editor @hawc2 will read it through before we move forward to Phase 6: Sustainability + Accessibility.

%%{init: { 'logLevel': 'debug', 'theme': 'dark', 'themeVariables': {
              'cScale0': '#444444', 'cScaleLabel0': '#ffffff',
              'cScale1': '#882b4f', 'cScaleLabel1': '#ffffff',
              'cScale2': '#444444', 'cScaleLabel2': '#ffffff'
       } } }%%
timeline
Section Phase 4 <br> Open Peer Review
Who worked on this? : Reviewers (@felipelmc + @rnelson2)
All  Phase 4 tasks completed? : Yes
Section Phase 5 <br> Revision 2
Who's working on this? : Author (@adamlporter)
Expected completion date? : Nov 9
Section Phase 6 <br> Sustainability + Accessibility
Who's responsible? : Publishing Team
Expected timeframe? : 7~21 days
Loading

Note: The Mermaid diagram above may not render on GitHub mobile. Please check in via desktop when you have a moment.

@anisa-hawes
Copy link
Contributor

Hello Adam @adamlporter. How are you?

I wondered how you are getting on with your Phase 5 revisions?

Please let us know if there's anything we can do to support your next steps. Nabeel @nabsiddiqui has summarised the reviewers' suggestions to guide you, but is here to discuss any aspect. Meanwhile, Charlotte and I are on hand to help with any practicalities.

Looking forward to collaborating with you to move this lesson through the final stages.

@adamlporter
Copy link
Collaborator

adamlporter commented Dec 8, 2024 via email

@anisa-hawes
Copy link
Contributor

Thank you, @adamlporter. No rush - We realise how busy things are towards the year-end. Please let us know if we can help in any way.

All best for now,
Anisa

@adamlporter
Copy link
Collaborator

adamlporter commented Dec 11, 2024 via email

@anisa-hawes
Copy link
Contributor

anisa-hawes commented Dec 11, 2024

Thank you, Adam @adamlporter. We really appreciate your work on these revisions. I can confirm that your commits have all been successful.

Unfortunately, we cannot receive images within the comment thread, so I don't have the new version of your screenshot. May I ask if you could send this to Charlotte [[email protected]] as an attachment in a direct email? From there, we can process and upload it.

Charlotte's full name is Charlotte Chevrie. I am very grateful to you for expressing thanks to us all in the way that you have. Positive feedback from contributors helps us to know that gradual improvements and adaptations to our workflow are bringing the difference we hoped for. Thank you! ☺️

@charlottejmc
Copy link
Collaborator

Thank you @adamlporter, I have received your new version of Figure 06 and replaced it in the lesson. Thank you also for your kind acknowledgements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 5 Revision 2
Development

No branches or pull requests

7 participants