I. Introduction
Machine learning tools and techniques are becoming prevalent in data-intensive software projects. Among various languages used for Data Science, Python has become one of the most popular languages because of its large collection of libraries to organize and analyze data. Mining such Python projects would be helpful to improve language design, library enhancements, bug detection as well as open new research directions. For example, by analyzing programs from thousands of Data Science projects, we can suggest the best library for performing specific tasks, find recurrent bugs, improve certain APIs, etc.