datascience

The SemEval-2016 dataset was downloaded using their recommended python script for Natural Language Processing (NLP) task of Sentiment Analysis. The instructions can be downloaded from https://github.com/aritter/twitter_download. I had to comment line 36 in download_tweets_api.py because script was crashing.

#uid = fields[1]

The raw downloaded dataset stats are:

Environment	Task	Point	number of Tweets
DEV	A	Three	2000
DEV_TEST	A	Three	2000
TRAIN	A	Three	6000
TEST	A	Three	20,633
DEV	B,D	Two	1325
DEV_TEST	B,D	Two	1417
TRAIN	B,D	Two	4346
TEST	B,D	Two	10,552
DEV	C,E	Five	2000
DEV_TEST	C,E	Five	2000
TRAIN	C,E	Five	6000
TEST	C,E	Five	20,632

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ProjectData		ProjectData
SemEval-RawData2016		SemEval-RawData2016
.gitignore		.gitignore
README.md		README.md
temp.txt		temp.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datascience

About

Uh oh!

Releases

Packages

msheikh24/datascience

Folders and files

Latest commit

History

Repository files navigation

datascience

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages