Maize block "M" and "Multidisciplinary Design Program, University of Michigan"
  • About
    • About MDP
    • MDP Team
    • Student Staff
    • Contact Us
    • Join our Mailing List
  • Students
    • Start here!
    • Faculty Research Overview
    • Industry Sponsored Projects Overview
    • Team Resources
    • Academic Advising
    • Academic Credit
    • MDP Minor
    • Student Highlights
  • Faculty
    • Advance Your Research
    • Faculty Research Teams
    • Mentor a Faculty Research Team
    • Mentor an Industry Sponsored Team
    • Faculty Partners
  • Events
    • All
    • Design Expo
  • Sponsors
    • Partner With Us
    • Corporate Highlights
  • Projects
    • 2025 Projects
    • Archived Projects
  • Apply
    • How To Apply
    • Application FAQ
    • Info Sessions
    • Review Projects
    • Project Fair
    • Experience & Interest Form
    • Video Interviews
    • Application Help Sessions
    • Join the Waitlist!

ProQuest-23

Back to Search
Full Project Details

Apply

  • Overview
  • Student Skills
  • Mentors
  • More Information

ProQuest, part of Clarivate, is an educational technology company committed to empowering researchers and librarians around the world. The company’s portfolio of assets — including content, technologies, and deep expertise – drives better research outcomes for users, and greater efficiency for the libraries and organizations that serve them. Students on this team will use machine learning techniques to deliver a proof-of-concept system which predicts the future success of academic researchers based on information available near the time of dissertation publication.

Abstract:

This project brings together two major research datasets, namely ProQuest’s Dissertations and Theses and Clarivate’s Web of Science. Each of these datasets is unique and valuable for understanding both the start and subsequent success of academic researchers. Can we predict a researcher’s success from their dissertation? When does a research career peak? Does this vary by field? Do superstar researchers come from within or from outside? What are some of the biases and pitfalls of different definitions of success?

To support the proof of concept, this project will:

  • Model academic research success as a machine learning problem.
  • Formulate predictive features from a variety of data sources.
  • Leverage textual data to exploit word embeddings where available and appropriate.
  • Measure and understand the relationship between our ability to make accurate predictions and the amount of time which has passed since the publishing date of the researcher’s dissertation.

Data sources to exploit will include, but not be limited to:

  • ProQuest Dissertations – available data includes the author, advisor, committee members, subject terms, author-supplied keywords, university, department, references, and text of the dissertation abstract.
  • Web of Science – available data includes author, title, abstract, publication, publication date, references, citation counts and usage counts.
  • Historical information about searches and their frequencies on the ProQuest Platform.
  • Outside data sources which can be freely obtained and have relevant time period information.

    Impact:

    The ability to successfully predict up-and-coming academic superstars can be leveraged across ProQuest’s suite of products to support both librarians making acquisition decisions, and the researchers who use the acquired assets.

    See complete details

    Natural Language Processing (2 Students) 

    Specific Skills:

    Strong interest in Natural Language Processing and Statistical Language Modeling. Please highlight your experience in your personal statement.

    Likely Majors: CS, EE, Math, CE, DATA

    Machine Learning (2 Students)

    Specific Skills:

    Experience / Strong interest in Machine Learning

    Likely Majors: CS, EE, ROB, Math, CE, DATA

    Programming (2-3 Students)

    Specific Skills:

    Solid programming experience — EECS 281 (or equivalent)

    Key Skills: Python

    Likely Majors: CS, DATA

    Sponsor Mentor

    John Dillon

    Text and Data Mining Product Manager

    John Dillon, Ph.D., is the Text and Data Mining Product Manager at ProQuest. His work focuses on pairing computational text analysis methods with traditional Humanities and Cultural Studies disciplines. He has published papers on Machine Learning and Sentiment Analysis, and worked previously as a postdoctoral researcher with the University of Notre Dame, USAID, and IBM Research.

    Executive Mentor

    Dan Hepp

    Data Science Lead

    Dan has thirty years of experience in research and production settings developing complex systems. He has a demonstrated track record of finding creative solutions to difficult technical problems and making them effective in real-world situations. Dan has expertise in machine learning, data mining, information extraction, pattern recognition, information retrieval, natural language processing, computer vision, artificial intelligence, and optical character recognition.

    Faculty Mentor

    Sindhu Kutty

    Electrical Engineering and Computer Science

    ​​Dr. Kutty is a faculty member in the Computer Science department at the University of Michigan where her primary focus is on undergraduate teaching and research. Her research interests are in the applications of Machine Learning (including in Economics), fairness in Machine Learning as well as in Computer Science Education. She is passionate about getting undergraduate students excited about venturing beyond the course curriculum, and works with them to channel that excitement into publishable research. Her research work both with undergraduate students and other collaborators has been recognized by awards at various conferences and competitions. Most recently, undergraduate work that she mentored has been recognized with first place awards at the international research competition at Project X and at the ACM undergraduate research competition at the Grace Hopper Conference.

     

     

     

    Course Substitutions: CE MDE, ChE Elective, CS Capstone/MDE, DS Capstone, EE MDE, CoE Honors, ROB 590, SI Grad Cognate

    Citizenship Requirements:

    • This project is open to all students.
    • International students on an F-1 visa will be required to declare part-time CPT during Winter 2023 and Fall 2023 terms.

    IP/NDA: Students will sign standard University of Michigan IP/NDA documents.

    Internship/Summer Opportunity: No summer activity will take place on the project.

    [email protected]
    (734) 763-0818
    117 Chrysler Center

    © University of Michigan

    QUICK LINKS

    Home

    About Us

    Projects

    Events

    Advising

    Contact Us

    SOCIAL MEDIA

    • Follow
    • Follow
    • Follow
    • Follow