Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Occupation types and subtypes #1

Open
ebeshero opened this issue Aug 10, 2018 · 27 comments
Open

Review Occupation types and subtypes #1

ebeshero opened this issue Aug 10, 2018 · 27 comments
Assignees

Comments

@ebeshero
Copy link
Member

For new schema development, we need to review the current proposed list to streamline occupation encoding in the site index. That list is posted on the Documentation site here:

https://digitalmitford.github.io/DM_documentation/SI_Occupations_Guide.md

@lmwilson @Samwebb64 @KellieDC @ghbondar

@ebeshero
Copy link
Member Author

So, this is an example of an "issue" or "ticket" we file in a GitHub repo. It's analogous (and an improvement) on the Box comments we'd write. It's an improvement because you can edit your posts. You can link to files in the GitHub repo (click on the code tab) if you need to, and you can link out to other things, and (as you see above) we can ping members of our team by their GitHub handles.

@ebeshero
Copy link
Member Author

If you're reading these posts in your e-mail, know that they are coming from a GitHub "Issues" tab on one of our Digital Mitford GitHub repos. You can see what the post looks like at its source by scrolling to the end of the e-mail--at the bottom you should see a line that reads:

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

And if you click on the "view it on GitHub" link, you go straight to the issues. (The link isn't present in my snip of a quote above.)

@Samwebb64
Copy link

@ebeshero @lmwilson
Are you guys using a specific reference source to collect occupation names? I want to make sure I use the same one to identify the ones I'm finding.

@ebeshero
Copy link
Member Author

Hi @Samwebb64 ! @lmwilson did a lot of this by looking at census categories, and a period pamphlet listing social classes, and the two of us consulted WorldCat, Library of Congress headings, Wikipedia headings. We didn't find a good, single standard authority file for this. Also, we're agreed on a philosophy of "less is more", so we looked for ways to group and cluster occupations conceptually-- capacious terms that work semantically to group 19c occupations that we've seen in MRM's world. And we wrote a few in camelCase (like 'seaCaptain') so none of them contain white spaces. The reason for that is, we want to make it possible to code multiple subtypes with a type, separated by a white space.

@lmwilson
Copy link
Contributor

Can you generate a list for us of all the occupation types currently in use? I think I would like to check that again.

I realized when I went through si-add-chas1-findens-rienzi that I might have missed a few important ones:

  1. Rector is different from vicAR in c of e. (Unless we want to put all titles in roleName, like Vicar of Dibley, Rector of Ashe, etc.) Have we thought through the pros and cons of that method in terms of using it to analyze data? Will that work the way we want?
  2. We don't have a subtype=professor for academics from NOW. So I cobbled together 3 types: educator/scholar/etc. Maybe that is fine because we are using professor of x in roleName. I had used "orator" to capture Dickens's work as a public lecturer. I'm not sure whether that should be under type=literary or type=theater. But it will come up for others, like Coleridge. So I think we need something.
  3. We also don't have a good solution right now for law enforcement (constable? Government or law?) or positions like magistrate or Sherriff (Sherriff positions were mostly elected and office holders don't need law degeees.) Once again, are these admin. or legal.
  4. Sam pointed out there are a bunch of occupations elected by the local church vestry, especially in c of e: sexton, churchwarden, pew opener, etc. In protestant churches there are deacons and elders etc. These are non clergy who do administrative and clergy-adjacent jobs for the local church.
  5. We also want to be sensitive to the kinds of unpaid work that exist, particularly for the women. A number could go under "benefactor" but we may want additional subtypes. This gets into a whole issue about 'invisible labor' of the kind mostly women do....sewing for themselves or to make charitable items for sale or both, for example. Can we try to reflect that without completely reproducing the logic that that work doesn't "count"? As an example, think about Caddie and Mrs. Jellyby in Dickens. Sometimes with real or fictional people that kind of work is still unknown because it's unrepresented. But I think it's worth trying....Would be a fairly unique feature of DM!

Another general issue. Sam and I had discussed whether we would ever want to somehow try to use the occupation data or some other way to be able to gauge the numbers of members of different social classes. We don't yet have a way to do that and maybe there isn't one. Possibly we could use a general heading under occupation of socialClass and then use something like royal, aristocrat, landed gentry, artisan/tradesman, (tenant farmer?), etc. Some categories might be fluid over someone's lifetime, like George Mitford.

I talked to Sam and brought her up to speed on what we have done so far. She is going to put her compiled occupations list and notes in box in the si IP folder we have there. That made the most sense because we agreed to keep the very drafty things in the box folders.

@ebeshero
Copy link
Member Author

ebeshero commented Aug 15, 2018

@lmwilson : That list is posted here (on the DM_documentation repo):
https://github.com/DigitalMitford/DM_documentation/blob/master/SI_Occupations_Guide.md

@Samwebb64

@ebeshero
Copy link
Member Author

ebeshero commented Aug 15, 2018

About the questions: Here’s my two cents:

  1. When we were meeting and working on the occupations coding last week, you and I tried to stick to this guiding idea: The @type attribute is required on the <occupation> element, and it is broadly topical and most likely the most useful category for our analysis. So for the rector vs. vicar question it doesn’t seem like this matters nearly as much as the simple distinction of @type="religious". The @subtype list is secondary, optional, and for a given entry can be multiple, so if you wanted to, for someone who was both a rector and a vicar and different points in life, we could encode
<occupation type="religious" subtype="rector vicar"/>

And we can add a new occupation element if this person was also occupied as @type="explorer".

So, how important are the subtype codes? We decided pretty strongly that these should NOT be the same as what we code with <roleName>, so this shouldn’t become too specific. These should be distinct words that convey categories.

So, the question is, is rector a strong categorical distinction on the level of vicar and minister? Is vicar a strong categorical distinction from the other two? I think we wanted a sense of distinction among levels of commitment to a community or church. If we should add rector, let’s do it but be clear about why we need the subcategory.

@ebeshero
Copy link
Member Author

  1. We may want to use the occupation encoding on members of every list after all (our editing team as well as people in MRM’s world, and I think adding a general subtype category of "professor" could work for anyone employed in various faculty positions at colleges and universities as institutions of “higher education”. Works for me, and this will help with speeding updates to the big SI. We can add if you agree, too.

Currently, "orator" is only a subtype under type="government", because this general category includes political and social reformers. We talked a bit about adding it to other categories—where else does it need to be?

@ebeshero
Copy link
Member Author

ebeshero commented Aug 15, 2018

  1. Doesn’t law enforcement belong in the type="government" category? Why don’t we do a simple camelCase string that will lump sheriffs and constables like so:

<occupation type="government" subtype="lawEnforce"/>

You’ll see a few other camelCase solutions like this in our subtype lists, too.

Hmm. I can see an argument for putting law enforcement under type="legal", but it needs to be in one OR the other type category, not both. What do you think?

@ebeshero
Copy link
Member Author

  1. As for “clergy adjacent”, let’s come up with a good general term for the wide variety of position names here. subtype="churchAssist"?

@ebeshero
Copy link
Member Author

  1. Great idea! I guess expanding the tiny category of benefactor makes sense here, if this constitutes unpaid effort. What about these new subtypes?

domestic
volunteer

?

@ebeshero
Copy link
Member Author

Okay, last point from your post: Being able to track class mobility might be interesting, but maybe we can do that already if people are engaged in multiple occupation types? Anyone who turns up in trade as well as legal for example, might be a person of interest.

@Samwebb64
Copy link

@ebeshero @lmwilson
I just posted the occupations list in the SI IP folder in Box.

As for these points, here's what I think.

  1. I agree with these broad categories, and the "less is more" philosophy. The only category that seemed to be missing was an outlier - what we might call "freelance" or non-waged trades. But these can be included into trades, I suppose.

  2. On my list, I added subcategories to the trade. Maybe these can be used as s, in the same way we might use 3. <subtype=lawEnforce>? Lisa and I also discussed that <subtype=orator> could apply to <type=literary>, so writers who also gave lectures, like Coleridge or Dickens.

  3. After checking on the job of a constable, I'm fairly sure that should go under the <type=government>. I didn't check "sheriff" but it seems clear the role of constable flows initially from Charles II. But that's my interpretation. (Canadians and our good government Constitution. :-)

  4. I'm in favor of going with religious that are specific to the position. I think we'll encounter many religions and positions, so if we can capture many of them, this allows us to accommodate them.

  5. I like the idea of <subtype=benefactor> for unpaid female labor doing sewing, painting, teaching in families. In some senses, these women would have been thought of as "dependents," so this is an interesting spin, and true to the spirit of a number of sketches.

One thing to keep in mind is that, in OV, many of the characters with occupations are actually identified only BY their occupations; they have no actual personal names (except occasionally some are mentioned in passing). I'm not sure if this will throw a wrench into these occupation tags, since they will also be tags. But s will have a redundancy, as in: The rector is the rector in OV...

@ebeshero
Copy link
Member Author

ebeshero commented Aug 17, 2018

Thanks, @Samwebb64 ! I'm reading your list in Box now and thinking of ways to blend it with the list here in the repo.

  • I'm likely to try to lump some of the subcategories together--where you have "butchersBoy" and "apprentFootman" just indicate the occupation subtype as "butcher" and "footman" because that's a topical area of work. (Alternatively, we should be consistent about identifying the "work area" at the start of the entry and the "understudy" aspect second at the end: "butchersBoy" and "footmanApprent". But I think to simplify, the occupation element doesn't have to carry precise details that will come through in the rest of our <person> entry.

  • Can we think of a way to simplify kinds of religious positions, so as to distinguish titles (specific to distinct communities) from functions? If we can keep the subcategories simple, and keep in mind that we can correlate the <occupation/> element information with specific titles given in <roleName> we will reduce redundancy in our tagging.

  • You raise a good point about such redundancy related to the OV characters who aren't named. In this case, we have an opportunity to use simpler/broader category words in the <occupation/> element to pair up with a person whose only name is essentially a <rolename>. We may, indeed, want to structure such OV entries like this:

 <person xml:id="butcherBoy_OV">
                 <persName> 
                    <roleName>butcher's boy</roleName>
               </persName>
               <occupation type="trade" subtype="butcher"/>
 </person>

See how that can work to use the <roleName> to complement the occupation element?

  • Shall I make a try of reconciling your list, Sam, with the list Lisa and I hammered out? And then let's review it together?

@ebeshero Did we implement this or not yet? Some of the same questions came up again from the student sweep through.

@lmwilson
Copy link
Contributor

lmwilson commented Aug 17, 2018 via email

@jamesrovira
Copy link

jamesrovira commented Aug 17, 2018 via email

@ebeshero
Copy link
Member Author

ebeshero commented Aug 17, 2018

@lmwilson @jamesrovira @Samwebb64 The <roleName> element is inside the<persName> elements in the SI. The first <persName> element is the one we post for identifying the individual in the mouseover notes. Sometimes that first persName element contains several <roleName> elements, especially for aristocrats or royalty, and on mouseover of a name in a Mitford file on the website, those are what we see first after their most recognized name, followed by the info on birth and death dates as stored on the <birth> and <death> elements. I am pretty sure we are not outputting the <occupation>elements at all in the mouseover notes, but going directly to the <note> element because this is what our readers need to see to quickly identify a name or reference.

So, how would we use the <occupation> element? Remember, our project tracks information that can be analyzed and published in a much wider range of ways than the pop up annotations on the letters. We are preparing to post more aggregate data from the SI, and we can produce charts and graphs indicating the numbers of individuals associated with occupation categories and subcategories. Here, we don’t want too many subcategories to choose from—we probably don’t want a lot of solo outliers: lumping rather than splitting should be the rule here for our coding team, too.

We can correlate the <roleName> elements when present with the lumping single-word topical categories of the types and subtypes on the <occupation> element—so in this way, individuality and specificity is not lost. We can generate an output that lists all the <roleNames> connected with Trade or Government occupations in the SI, for example. And that is why I say we can think of <roleName> as supplying the individualized detail on occupations that we don’t wish to lose.

This is really to help us with aggregate processing: We can output lists of all the people engaged in X trade or in domestic service, and make charts indicating the numbers and representation of various occupations in MRM’s real world vs. perhaps the world of OV. It’s an aid to aggregate research—and to me it is pretty exciting to be able to do this—more systematically than we have been. Does that help to convey what we are hoping to accomplish with the controlled vocabulary of types and subtypes on the occupation element?

@lmwilson Yes: We can quickly output the wild and uncategorized list of occupation element contents in the SI—I’ll do it and post today, but you can see it quickly for yourself using the XPath window in oXygen: With the current SI file open, paste this XPath expression in that window:

//occupation

To review, this simple expression says, “Start at the document node (above the TEI root element) and look down the entire descendant axis (all the children and children’s children deep down through the entire XML tree) and locate every occupation element. If you enter that in the oXygen XPath window, you will see and be able to scroll through a list of results in the bottom window.

@ebeshero
Copy link
Member Author

Just proofed and corrected my post a bit—go read on GitHub rather than email. :-)

@ebeshero
Copy link
Member Author

@lmwilson Here is a table containing the contents of each distinct occupation element in the current SI. I've output it as a numbered table, together with a count of all the times this value appears, and an output list of the first <persName> element in the <person> entry that contains this occupation value.

https://digitalmitford.github.io/DM_documentation/SI_currentOccupationsTable.html

@ebeshero
Copy link
Member Author

ebeshero commented Aug 18, 2018

I've just refreshed the output to sort it by count (of the number of times each value actually appears in the SI), and I updated the explanation at the top.

You'll see we have quite a lot of "one-off" values, which is motivating us to make the occupations lists more systematic now! I favor the idea of correlating roleName with occupation when we examine this information, because it frees us from having to use every precise word available in a specific context for a kind of occupation. Hmm. Maybe I'll output another column in the table to show the roleName elements associated with each occupation value.

Of course in our current (=old/original) system, we didn't attempt to make subtypes. And yes, we'll have lots of work to apply a new tagging system for occupations retroactively, but for this we make special schemas for Site Index editing. That and GitHub coordination will help us share the work with more than one person at a time.

@ebeshero
Copy link
Member Author

And I've now added roleNames, where they were available, so we can see how these might correlate.

@ebeshero
Copy link
Member Author

ebeshero commented Aug 18, 2018

Be sure to refresh your browser--wait for new stuff to come up. Sometimes GitHub pages takes a few minutes to complete an update: https://digitalmitford.github.io/DM_documentation/SI_currentOccupationsTable.html

@lmwilson
Copy link
Contributor

lmwilson commented Aug 18, 2018 via email

@lmwilson
Copy link
Contributor

lmwilson commented Aug 18, 2018 via email

@ebeshero
Copy link
Member Author

ebeshero commented Aug 18, 2018

Just tried setting those tick marks (```) around your axis step (//) in your post to make it stand out as code --apparently it doesn't work when the response comes by email!

@lmwilson
Copy link
Contributor

  1. Doesn’t law enforcement belong in the type="government" category? Why don’t we do a simple camelCase string that will lump sheriffs and constables like so:

<occupation type="government" subtype="lawEnforce"/>

You’ll see a few other camelCase solutions like this in our subtype lists, too.

Hmm. I can see an argument for putting law enforcement under type="legal", but it needs to be in one OR the other type category, not both. What do you think?

@ebeshero Elisa--Here is the discussion thread between you, me and Sam from last August on some of the grey areas in the Site Index Occupations: including what to do with unpaid labor, police, etc. It looks like we came to some good conclusions here but have not yet implemented them in the occupations list.

@ebeshero
Copy link
Member Author

@lmwilson Looking back on this, if we are considering a lumping of constables and sheriffs and police as a general group of "law enforcement officials", can we simply use our @type="legal" and apply the simple @subtype="enforcement"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants