The data package includes the GermaParl corpus of plenary protocols (German Bundestag). The data has been indexed, i.e. it has been imported into the Corpus Workbench (CWB). Using the CWB keeps the data size modest, ensures performance, exposes the CQP syntax, and generates opportunites to combine quantitative and qualitative approaches to analysing text.

The GermaParl package is designed to be used with polmineR as a toolset for various standard qualitative and quantitative tasks in text analysis (count, dispersion, ngrams, cooccurrences, viewing concordances as well as going back to the original full-text). Using polmineR, you can easily generate data structures (such as term-document matrices) that are required as input for advanced statistical procedures.

### Installation

The GermaParl package is hosted at a private CRAN-style package repository on the Web-Server of the PolMine Project. The polmineR-package offers a convenient installation mechanism.

library(polmineR)
install.corpus("GermaParl") # bulky data, that may take a while

### Using GermaParl

To check whether the installation has been successful, run the following commands. For further instructions, see the documentation of the polmineR package.

use("GermaParl") # to activate the corpus in the data package
corpus() # to see whether the GERMAPARL corpus is listed
size("GERMAPARL") # to learn about the size of the corpus

The data comes with a CLARIN PUB+BY+NC+SA license. That means:

PUB - The language resource can be distributed publicly.

BY - Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

NC - NonCommercial — You may not use the material for commercial purposes.

SA - ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.