的见解
的角度来看

Can Machine Learning Automate the Creation of Data Catalogs?

2024年8月5日

Many research and evaluation questions can be answered using extant data from surveys, 项目评估, 以及管理数据. Data catalogs help support the use of extant data by providing researchers with an overview of different datasets and information about how to access them. Developing and maintaining data catalogs requires considerable time and resources to ensure they are thorough and accurate.

Our team from 趣赢平台 and the American Institutes for Research (空气), on behalf of the U.S. 美国劳工部(DOL), explored the potential of automated machine learning (ML) algorithms to create and maintain data catalogs for employment and training outcomes. We conducted a literature review, piloted a manual data catalog assembly process, and consulted with a technical working group of computer science experts.

While current ML innovations present challenges for automating data catalog development, 有一些很有希望的解决方案. Data sources vary in their structures and the amount of metadata available, making it difficult to develop an automation program. 此外, employing automations for even a portion of the data catalog development process would require large investments in staff and computing resources. 然而, artificial intelligence (AI) is a rapidly evolving field, and literature suggests there may be many opportunities to support automated data collection in the future, including the potential use of generative AI. Federal agencies might explore and use existing tools to meet their needs.

我们的短暂, Explorations in Data Innovations: Can Machine Learning Support Data Catalog Development? (PDF) details the data catalog development process, 自动化选项, and recommendations for future explorations to eventually harness artificial intelligence. When using machine learning to produce public-facing products, Federal agencies may need to use a mix of staff with different skill sets, 包括数据科学家, 网站开发人员, 云计算专家, 主题专家.

艾莉森·海拉博士, 趣赢平台 助理副总裁 of Social Policy and Economics Research and the project director for this work, summed up the study: “This was a great collaboration with the staff of DOL’s Chief Evaluation Office (CEO) not only because we explored the extent to which new technology can support researchers, but because CEO wanted to share our lessons learned despite finding automated data catalogs are currently infeasible. By engaging the field and identifying next steps, 趣赢平台, 空气, and CEO are contributing to realizing a future where we can further democratize access to datasets that answer pressing employment and training information needs.”

的见解

与我们的专家深入探讨

查看所有见解

我们能帮什么忙??

We welcome messages from job seekers, collaborators, and potential clients and partners.

保持联系

想和我们一起工作?

你会有很棒的同伴.

探索职业
回到顶部