Identifying the Productive and Influential Bloggers in a Community: The Engadget dataset

In this paper we introduced BP-Index and BI-Index, two metrics used to identify the most productive and influential bloggers of a community blog.

One of the datasets we used is a crawl of the Engadget blog community performed in April 2nd, 2010. Here we provide copies of the data files of the MySQL database of our employed dataset. All you have to do is to download the compressed database files, and extract them into the data/ directory of your MySQL installation. Then, you should be able to immediately access the Engadget database through your MySQL management interface.

The database consists of four tables:

  • authors: contains the 93 bloggers of Engadget.
  • inlinks: contains linking information (i.e. the incoming links) of the Engadget posts. The table stores 319,880 records.
  • posts: The 63,359 Engadget posts accompanied by their metadata (blogger ID, publication date, number of comments, etc).
  • comments: This table stores all the comments made to the Engadget posts (3,672,819 records).

You can download the dataset by clicking here (359.6 MB).

In case you need the dataset in another format (e.g. XML or JSON) please feel free to contact me.

Note: The researchers who used, or will use this dataset, are kindly asked to cite the following article in their work/s.
L. Akritidis, D. Katsaros, P. Bozanis, "Identifying Influential Bloggers: Time Does Matter", In Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 76-83, 2009.