Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stylometry Workshop Prep #1

Open
ebeshero opened this issue May 9, 2018 · 2 comments
Open

Stylometry Workshop Prep #1

ebeshero opened this issue May 9, 2018 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@ebeshero
Copy link
Member

ebeshero commented May 9, 2018

Plays

Possible questions for stylometry:

  1. Role of Actor-Managers in altering plays:
    This needs just performed versions
    Maybe: List of Macready-only variants (based on difference from variants)
    What evidence do we see of distinct voices in the play documents?

Training set of files:

Can we get 3 plays by two directors
And an "Unknown" testing set (but really an outsider that we Know, so we know what the right answer)
(Maybe the director on a different author?)

Processing:

  1. entirely plain text of the just the play (no metadata or cast list)
  2. structural markup only (stage directions, acts, scenes, actors, and speeches)
  3. Data pulled from structural markup:
  • numbers of actors in scenes
  • director-specific variants we've identified
  • stage directions only
  • stage directions NOT in manuscript
    Question: Can any of these perform as well as just the plain text for Stylometric analysis?

@juola

@ebeshero ebeshero added the enhancement New feature or request label May 9, 2018
@ebeshero ebeshero self-assigned this May 9, 2018
@ebeshero
Copy link
Member Author

ebeshero commented May 9, 2018

Alternative: (possibly a longer collaborative research project post 25 May)

Question: Does Mitford writing prose sound "more like" Jane Austen, or to herself when she writes plays? And/or to Byron when he writes plays?

Think about structural characteristics from the markup (markup data) that might be helpful for stylometry. (This is something Patrick's curious to know...) (the ontological categories are more important than the hierarchy)

@ebeshero
Copy link
Member Author

ebeshero commented May 9, 2018

methods / parameters:

string-length()
number of words per sentence (sentences determined by end-stop punctuation followed by white space)

These quantitative metrics aren't really great distinguishers.

Use of function words (= words whose meaning is defined by context)

stop words = words that are so common that processing them doesn't help

Sometimes Stylometrists filter out everything except stop words, because these show the most distinctiveness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant