rounded

Twittersong 1 year, 7 months ago. by mrflip

Took the 50M twitter messages we saw between mid-November and mid-January and used Wordle to make a word cloud:  http://bit.ly/tweetcloud Fun!

(If you’re not familiar with a word cloud: the larger a word, the more often it was used. The colors & positions don’t mean anything, they’re just for fun. We stripped out the little words (a, the, with, …), leaving everything that appeared more than 10,000 times in the 50 million+ tweets we examined.)

Then I looked again at the filtered list and noticed something… just awesome.

Here are the forty most-commonly used words, in their exact order of decreasing frequency:

It’s time, Twitter. Love/Christmas blog:

Home! Thanks, people…

Night post:

Getting happy
watching morning
that’s tonight.
Tomorrow: looking news, trying nice? Check.

2009: Hope.
Week: 2008.

Little video:

snow.

Live free. Life. Awesome days!

Doing:

Feel house ready.
Look cool.
Sleep.
Yeah world!

I like your poem, Twitter.
A lot.

(more…)

Massive Scrape of Twitter’s Friend Graph 1 year, 8 months ago. by mrflip

UPDATE:

We’ve posted several Twitter datasets on Infochimps. Take a look and build something cool!

UPDATE:

We’ve taken the data down for the moment, at Twitter’s request. STAY CALM. They want to support research on the twitter graph, but feel that since this is users’ data there should be terms of use in place. We’ve taken the data down while those terms are formulated. I pass along from @ev: “Thank you for your patience and cooperation.”


The infochimps have gathered a massive scrape of the Twitter friend graph.  Right now it weighs in at

  • about 2.7M users: we have most of the “giant component”
  • 10M tweets
  • 58M edges

(These and other details will be updated as further drafts are released. See below for technical info).  This is still in rough, rough draft but this dataset is so amazingly rich we couldn’t help sharing it.  We have not done all the double-checking we’d like, and the field order will change in the next (12/30) rev.  We’ll also have a much larger dump of tweets off the public datamining feed.

The data is offline at the moment pending some TOS from twitter.com. If you’re interested in hearing when it’s released, follow the low-traffic @infochimps on twitter or look for a post here.

Big huge thanks to twitter.com: they have given us permission to share this freely. Please go build tools with this data that make both twitter.com and yourself rich and famous: then more corporations will free their data.

(more…)

Austin Data Nerds RIGHT JOIN 2 years, 2 months ago. by mrflip

Do you work with huge datasets?  Are you interested in the opportunities that abound when the world’s free open data are integrated and the Semantic Web becomes a reality?  Come discuss data mashups, visualizations, and tools to organize, discover and explore rich information streams.

Join us at Mangia pizza on Guadalupe (3016 Guadalupe #100: free wifi, beer & great pizza) Thursday July 10th at 6:30pm and meet your fellow Austin data nerds.

(Also at Upcoming)