Skip to content

IngoKl/PyXMLConc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyXMLConc

PyXMLConc-Logo

PyXMLConc is a very simple concordancer. It is supposed to be used in exploratory analysis of XML-annotated corpora. Its primary feature lies in the automatic detection of XML tags and attributes. The search/concordancing function supports regular expressions.

Note: Please be aware that this is not production software/code at all. I primarily use this tool to teach XML annotated corpora. There are numerous bugs and idiosyncrasies.

Usage

After cloning the repository, simply run python -m pyxmlconc.pyxmlconc. Alternatively, you can install PyXMLConc by running pip install .. This will make PyXMLConc available as a shell command.

The concordancer supports two working modes. The default mode (Tokenizer) tokenizes the text and builds the concordances from the individual tokens. The second mode, re.findall, uses regular expressions to search the text without previous tokenization. While this mode is somewhat more flexible, the user has to account for potential overlaps resulting in 'missing' concordances.

Todo

  • Add additional tests
  • Automatically centering the scrollbar
  • Frequency table as an actual table
  • Allow search from the frequency table
  • Color the actual search term; split up the concordance into columns
  • Select search terms from the frequency list
  • Fix issues when there are multiple attributes

Screenshot

Screenshot

Updates

  • (2020-12-01) PyXMLConc 0.2 - Upgrade to Qt 5 and PySide2; Ensure Python 3.x compatibility; Add simple frequency tables

Releases

No releases published

Packages

No packages published