Cornell Conversational Analysis Toolkit (ConvoKit) Documentation¶ This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. More information can be found at our website. The latest version is 3.0.0 (released July 17, 2023). Contents¶ Getting Started Installing ConvoKit Introductory tutorial Core Concepts Data Format Configurations Troubleshooting API Reference The Corpus model Speaker Utterance Conversation Corpus ConvoKitMatrix UtteranceNode Transformers Preprocessing Feature extraction Analysis Transformer (base class) Utilities Util Speaker Conversation Utilities Pipeline Datasets and Examples Datasets Conversations Gone Awry Dataset - Wikipedia version (CGA-WIKI) Conversations Gone Awry Dataset - Reddit CMV version (CGA-CMV) Cornell Movie-Dialogs Corpus CANDOR Corpus Parliament Question Time Corpus Wikipedia Talk Pages Corpus Tennis Interviews Reddit Corpus (all, by subreddit) Reddit Corpus (small) WikiConv Corpus Chromium Conversations Corpus Winning Arguments Corpus Coarse Discourse Corpus Persuasion For Good Corpus Intelligence Squared Debates Corpus Friends Corpus Spolin Corpus Switchboard Dialog Act Corpus Stanford Politeness Corpus (Wikipedia) Stanford Politeness Corpus (Stack Exchange) Deception in Diplomacy Corpus Group Affect and Performance (GAP) Corpus Supreme Court Oral Arguments Dataset Wikipedia Articles for Deletion Dataset CaSiNo Corpus Examples General ConvoKit usage (starting resource) Intermediate corpus functionality Classifier Coordination Expected Conversational Context Framework Fighting Words Forecaster Hyperconvo Politeness Strategies Ranker Speaker Convo Diversity
Contents¶ Getting Started Installing ConvoKit Introductory tutorial Core Concepts Data Format Configurations Troubleshooting API Reference The Corpus model Speaker Utterance Conversation Corpus ConvoKitMatrix UtteranceNode Transformers Preprocessing Feature extraction Analysis Transformer (base class) Utilities Util Speaker Conversation Utilities Pipeline Datasets and Examples Datasets Conversations Gone Awry Dataset - Wikipedia version (CGA-WIKI) Conversations Gone Awry Dataset - Reddit CMV version (CGA-CMV) Cornell Movie-Dialogs Corpus CANDOR Corpus Parliament Question Time Corpus Wikipedia Talk Pages Corpus Tennis Interviews Reddit Corpus (all, by subreddit) Reddit Corpus (small) WikiConv Corpus Chromium Conversations Corpus Winning Arguments Corpus Coarse Discourse Corpus Persuasion For Good Corpus Intelligence Squared Debates Corpus Friends Corpus Spolin Corpus Switchboard Dialog Act Corpus Stanford Politeness Corpus (Wikipedia) Stanford Politeness Corpus (Stack Exchange) Deception in Diplomacy Corpus Group Affect and Performance (GAP) Corpus Supreme Court Oral Arguments Dataset Wikipedia Articles for Deletion Dataset CaSiNo Corpus Examples General ConvoKit usage (starting resource) Intermediate corpus functionality Classifier Coordination Expected Conversational Context Framework Fighting Words Forecaster Hyperconvo Politeness Strategies Ranker Speaker Convo Diversity