Monday, November 04, 2013

Sunandan Chakraborty: Big Data Analytics for Development

Sunandan Chakraborty gave a talk entitled "Big Data Analytics for Development" in the HIIT Otaniemi seminar series. Chakraborty s presently doing an internship in the Microsoft Research Cambridge and is a PhD candidate at the Computer Science department of New York University.

Chakraborty's research includes analysis of parallel streams of datasets from the web to infer and forecast macroeconomic and societal indicators. The present the main focus is to extract events and infer spatio-temporal relationship between events using online news articles and other web-based sources. As related work, Chakraborty discussed, for instance, United Nations Global Pulse that gathers timely information to track and monitor the impacts of global and local socio-economic crises, and using Google Earth to study flood hazards.

Potential sources of useful data include news articles, blogs, social media, online image data and mobile call records. These can be used for inference of socio-economic indices. Challenges with data sources include biases, incomplete coverage, influence of personal opions, noise, lacking uniformity in quality, expensiveness, privacy issues, and limited availability. Chakraborty described five projects and their results:

  • development of a location specific summarization tool
  • development of diagnostic tools for online textbooks
  • computing cropland disappearing rate using Google Earth Satellite images
  • extracting structure from unstructured text
  • using mobile apps to collect data

Chakraborty reported results related to the design and use of a system that mines news articles, blogs and other information sources on the web to automatically summarize important climatic and agricultural trends as well as construct a location-specific climatic and agricultural information portal. This is system has been described in the article "Location specific summarization of climatic and agricultural trends" by Chakraborty and Subramanian. The idea has been to collect topic-specific information on, for example, erosion, infertility, scarcity, drought and floods in different locations. The authors have evaluated the system across 605 different districts in India.

An interesting example of a diagnostic tools for online textbooks is described in the article "Empowering authors to diagnose comprehension burden in textbooks". The authors (Agrawal, Chakraborty et al.) mine textbooks for identifying sections and concepts that can benefit from reorganizing. With their method, authors can quantitatively assess the burden that a textbook imposes on the reader due to non-sequential presentation of concepts. They have applied the tool to a corpus of high school textbooks that are in active use in India. This method could potentially be used in a complementary fashion with a related method described in our article "Assessing user-specific difficulty of documents".

Chakraborty discussed the fact that less and less land is used for agriculture. In India, farmers protest land acquisitions. Also climate change impacts on global agricultural situation. In their paper "Computing the rate of disappearance of cropland using satellite images", Chakraborty and his colleagues present a tool that can monitor this change through satellite images. Google Earth offers a huge corpus of satellite images across the globe that they have used in the analysis to distinguish between arable, barren, tree-covered and developed land areas. This application area has, of course, a long history in relation to pattern recognition and image analysis research (consider, e.g., the paper "A Comparative Study of Texture Measures for Terrain Classification" from 1976). Computational sustainability is naturally becoming more and more relevant.

A current project was described in which archived news data is analyzed for event detection and analysis of spatio-temporal relationship between events. The work is currently focusing on India and Indian news articles.

No comments: