rounded

Twittersong 1 year, 7 months ago. by mrflip

Took the 50M twitter messages we saw between mid-November and mid-January and used Wordle to make a word cloud:  http://bit.ly/tweetcloud Fun!

(If you’re not familiar with a word cloud: the larger a word, the more often it was used. The colors & positions don’t mean anything, they’re just for fun. We stripped out the little words (a, the, with, …), leaving everything that appeared more than 10,000 times in the 50 million+ tweets we examined.)

Then I looked again at the filtered list and noticed something… just awesome.

Here are the forty most-commonly used words, in their exact order of decreasing frequency:

It’s time, Twitter. Love/Christmas blog:

Home! Thanks, people…

Night post:

Getting happy
watching morning
that’s tonight.
Tomorrow: looking news, trying nice? Check.

2009: Hope.
Week: 2008.

Little video:

snow.

Live free. Life. Awesome days!

Doing:

Feel house ready.
Look cool.
Sleep.
Yeah world!

I like your poem, Twitter.
A lot.

(more…)

Massive Scrape of Twitter’s Friend Graph 1 year, 8 months ago. by mrflip

UPDATE:

We’ve posted several Twitter datasets on Infochimps. Take a look and build something cool!

UPDATE:

We’ve taken the data down for the moment, at Twitter’s request. STAY CALM. They want to support research on the twitter graph, but feel that since this is users’ data there should be terms of use in place. We’ve taken the data down while those terms are formulated. I pass along from @ev: “Thank you for your patience and cooperation.”


The infochimps have gathered a massive scrape of the Twitter friend graph.  Right now it weighs in at

  • about 2.7M users: we have most of the “giant component”
  • 10M tweets
  • 58M edges

(These and other details will be updated as further drafts are released. See below for technical info).  This is still in rough, rough draft but this dataset is so amazingly rich we couldn’t help sharing it.  We have not done all the double-checking we’d like, and the field order will change in the next (12/30) rev.  We’ll also have a much larger dump of tweets off the public datamining feed.

The data is offline at the moment pending some TOS from twitter.com. If you’re interested in hearing when it’s released, follow the low-traffic @infochimps on twitter or look for a post here.

Big huge thanks to twitter.com: they have given us permission to share this freely. Please go build tools with this data that make both twitter.com and yourself rich and famous: then more corporations will free their data.

(more…)

Vote for our SxSW Panel Talk, Get People Thinking about how the Web will help tame the Data Flood 2 years, 0 months ago. by mrflip

Aaron Swartz of get.theinfo.org and watchdog.org, Kurt Bollacker from freebase.com, Shawn O’Connor from timepedia.org, and we infochimps have each put in panel proposals for the SxSWi 2009 conference.  Please consider clicking through to rate (and comment!) on these talks:

By my cursory count, there are about three times as many proposals this year as last that center on using the web for large-scale data exploration, data mashups, visualization, etc. Even if you are not attending, though, your vote will help get more people learning about the current state and future possibilities of massive data exploration on the web.

Descriptions of those talks:

Beyond Mashup: Weaving the Global Data Tapestry

http://panelpicker.sxsw.com/ideas/view/1500

Data mashups of not a few but a few thousand sources become possible as community efforts, enabled by new tools and Creative Commons licensing, unify the world’s exploding store of free, open data. Come find out what’s awesome, what’s hard, and what’s possible when you discover there’s really only one dataset. (P Kromer, infochimps.org)

How the Internet is Transforming Governance

http://panelpicker.sxsw.com/ideas/view/1038

The Internet is starting to revolutionize everything about politics and governance. Panelists will discuss new initiatives that harness the power of the Web to engage citizens in online activism, collaborative governance and oversight in ways that are radically shifting political power structures and fostering more transparency and accountability by elected officials. (Gabriela Schneider, Sunlight Foundation)

Petabyte as Platform – Building “Everything about Something” Sites

http://panelpicker.sxsw.com/ideas/view/1449

Find a topic some audience cares deeply about: their neighborhood, our government, every motorcycle ever made; and let visitors see, explore and understand it, and you make the world a better place. We’ll discuss how participating in the open, global data commons beneficially transform our culture and economy. (Kurt Bollacker, Freebase.com)

Powers of Often: Powers of Ten in Time

http://panelpicker.sxsw.com/ideas/view/1649

In 1977, Charles & Ray Eames made a fascinating short film, Powers of Ten, showing the relative scales in the universe: from picnic, to city, to solar system, to galaxy, and so on, back to cells, molecules, and atomic nuclei. In the same spirit, Powers of Often will explore relative scales in time using real data and hard estimates: patterns of daily life, demographics, census data, generations, long term trends, forecasts, historical cycles, high-frequency finance, and solar cycles. (Shawn O’Connor, Timepedia.org)

Among all talks with “data” in the description, these also look interesting:

If you see any other worthwhile topics please reply.

Thanks!
flip

Stock Market dataset is up 2 years, 5 months ago. by mrflip

40 Years of data on every NYSE, AMEX and NASDAQ listed stock:

These links were busted before but should be worky now.