Loading [MathJax]/extensions/MathMenu.js
Data Schema to Formalize Education Research & Development Using Natural Language Processing | IEEE Conference Publication | IEEE Xplore

Data Schema to Formalize Education Research & Development Using Natural Language Processing


Abstract:

Our work aims to aid in the development of an open source data schema for educational interventions by implementing natural language processing (NLP) techniques on public...Show More

Abstract:

Our work aims to aid in the development of an open source data schema for educational interventions by implementing natural language processing (NLP) techniques on publications within What Works Clearinghouse (WWC) and the Education Resources Information Center (ERIC). A data schema demonstrates the relationships between individual elements of interest (in this case, research in education) and collectively documents elements in a data dictionary. To facilitate the creation of this educational data schema, we first run a two-topic latent Dirichlet allocation (LDA) model on the titles and abstracts of papers that met WWC standards without reservation against those of papers that did not, separated by math and reading subdomains. We find that the distributions of allocation to these two topics suggest structural differences between WWC and non-WWC literature. We then implement Term Frequency-Inverse Document Frequency (TF-IDF) scoring to study the vocabulary within WWC titles and abstracts and determine the most relevant unigrams and bigrams currently present in WWC. Finally, we utilize an LDA model again to cluster WWC titles and abstracts into topics, or sets of words, grouped by underlying semantic similarities. We find that 11 topics are the optimal number of subtopics in WWC with an average coherence score of 0.4096 among the 39 out of 50 models that returned 11 as the optimal number of topics. Based on the TF-IDF and LDA methods presented, we can begin to identify core themes of high-quality literature that will better inform the creation of a universal data schema within education research.
Date of Conference: 29-30 April 2021
Date Added to IEEE Xplore: 16 July 2021
ISBN Information:
Conference Location: Charlottesville, VA, USA

I. Introduction

In the field of education, there is currently a gap between research and practice. [5] NewSchools worked with Gallup to ask a sample of 3,210 teachers, 1,163 principals, 1,219 administrators, and 2,696 students what they think of and how they use education technology inside and outside of the classroom. [9] They found that teachers, principals, and administrators all trust teachers the most for recommendations on education technology. Teachers ranked research papers low on this list because they don’t place much trust in these reports that were planned and funded by the companies themselves. Teachers can also find these papers to be difficult to understand because the researchers’ language is not necessarily the same as the educators’ language. Laura Hamilton and Gerald Hunter (2020) saw this same trend with educational interventions as opposed to education technology - teachers tend to turn to other teachers for suggestions on academic interventions. [5] This shows a gap between research and practice because while researchers are writing reports about tools to be used in academia, educators are not fully translating this research into practice, and are thus not using these tools in the classroom.

Contact IEEE to Subscribe

References

References is not available for this document.