Data Quality Over Quantity or Data Selection for Data-Centric AI // Cody Coleman//Coffee Sessions#59

Coffee Sessions #59 with Cody Coleman, Data Quality Over Quantity or Data Selection for Data-Centric AI. // Abstract Big data has been critical to many of the successes in ML, but it brings its own problems. Working with massive datasets is cumbersome and expensive, especially with unstructured data like images, videos, and speech. Careful data selection can mitigate the pains of big data by focusing computational and labeling resources on the most valuable examples. Cody Coleman, a recent Ph.D. from Stanford University and founding member of MLCommons, joins us to describe how a more data-centric approach that focuses on data quality rather than quantity can lower the AI/ML barrier. Instead of managing clusters of machines and setting up cumbersome labeling pipelines, you can spend more time tackling real problems. // Bio Cody Coleman recently finished his Ph.D. in CS at Stanford University, where he was advised by Professors Matei Zaharia and Peter Bailis. His research spans from

3 views