Cluster wrapper function

cluster(data, ..., n_clusters, minimum_term_frequency = 3, min_terms = 3,
  num_terms = 10, stopwords = NULL, remove_twitter = FALSE)

Arguments

data

The data frame comparing the text vector as the first column

...

Additional columns of the data frame containing metadata cfor comparison

n_clusters

The number of clusters to be used for the clustering solution

minimum_term_frequency

The minimum number of occurences for a term to be included

min_terms

The minimum number of terms for a document to be included

num_terms

Number of terms to display in clustering summary output

stopwords

Additional stopwords to exclude from clustering analysis

remove_twitter

Whether to remove text associated with Twitter content, useful for when analyzing data from this source (defaults to FALSE)

Details

Performs the clustering half of the process, including assembling and cleaning the corpus, deviationalizing and clustering.

Examples

library(clustRcompaR) library(dplyr) library(quanteda) d <- inaugural_addresses d <- mutate(d, century = ifelse(Year < 1800, "17th", ifelse(Year >= 1800 & Year < 1900, "18th", ifelse(Year >= 1900 & Year < 2000, "19th", "20th")))) three_clusters <- cluster(d, century, n_clusters = 3)
#> Document-feature matrix of: 58 documents, 2,820 features (79.6% sparse).
extract_terms(three_clusters)
#> Cluster.1.Terms Cluster.1.Term.Frequencies Cluster.2.Terms #> 1 in 34.200000 in #> 2 my 13.866667 their #> 3 their 12.333333 govern #> 4 will 11.200000 will #> 5 govern 9.533333 has #> 6 peopl 7.200000 it #> 7 it 7.133333 state #> 8 nation 7.000000 been #> 9 has 6.733333 peopl #> 10 countri 6.533333 nation #> Cluster.2.Term.Frequencies Cluster.3.Terms Cluster.3.Term.Frequencies #> 1 77.52941 in 36.692308 #> 2 22.88235 will 16.076923 #> 3 21.41176 nation 12.500000 #> 4 20.29412 us 12.038462 #> 5 20.00000 world 9.807692 #> 6 19.41176 peopl 9.307692 #> 7 18.23529 can 7.769231 #> 8 17.82353 must 7.730769 #> 9 16.05882 america 7.423077 #> 10 14.41176 no 7.192308
three_clusters_comparison <- compare(three_clusters, "century") compare_plot(three_clusters_comparison)