The 3rd Workshop on Data Science with Human in the Loop @ KDD 2021
Program (August 15, 2021. All times are PDT.)
7:45 – 8:00: Workshop introduction
Lucian Popa, IBM Research
Title: Elevating the role of the human in human-in-the-loop learning
Abstract: The evolution of deep learning has eliminated the need for human ingenuity and domain expertise that went into designing informative features and pipelines around light statistical models. Challenging tasks like image recognition, speech recognition, and translation can now be learned via generic end-to-end models where you feed raw input and get back the prediction. The primary role of humans in this learning loop is providing labeled examples. Anyone who has engaged in actually labeling examples would certify to the mind-numbing tedium of this task. While techniques like active learning attempt to reduce the number of labeled examples, we ask if we can elevate the role of humans that is commensurate with their capability of higher-level abstraction. We present multiple paradigms of high-level human supervision including top-down rules with quality guides, and bottom-up rules with exemplars. We discuss algorithms for learning deep models from such noisy yet efficient modes of supervision.
Session Chair: Yunyao Li, IBM Research
9:00 – 10:00: Paper Presentation (5 papers)
Hang Jiang and Doug Beeferman. Topic-time heatmaps for human-in-the-loop topic detection and tracking
Sumanth Prabhu, Moosa Mohamed and Hemant Misra. Multi-class Text Classification using BERT-based Active Learning
Donato Tiano, Angela Bonifati and Raymond Ng. Human-Centered Clustering for Time Series Data
Diarmuid Cahalane, Aisling Nugent, Paul Sweeney and Justin Walker. Skills Explorer: A Human+AI Approach to Data Capture for Skills
Harald Hammarström. Gramfinder: Human and Machine Reading of Grammatical Descriptions of the Languages of the World
Session Chair: Slobodan Vucetic, Temple University
10:00 – 10:10: Break
10:10 – 11:00: Invited papers (2 paper highlights from recent conferences focused on Computer Human Interaction)
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach and Jennifer Wortman Vaughan. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning
Q. Vera Liao. Question-Driven eXplainable AI: Re-framing the Technical and Design Spaces of XAI
Session Chair: Slobodan Vucetic, Temple University
11:00 – 13:00: Lunch Break
Title: On the Power of Human Guidance at Turning Unstructured Text to Structured Knowledge (slides)
Abstract: The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We vision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. On the other hand, human guidance may still play a critical role in this process. In this talk, we study how minor human guidance may play a big role at discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach plus minimal human guidance can be promising at transforming massive text data into structured knowledge.
Session Chair: Eduard Dragut, Temple University
14:00 – 14:50: Invited Talks
Eser Kandogan, Megagon Labs
Title: Human(s)-in-the-Loop(s): Observations from the Data Science Practice
Abstract: Over the last several years at Megagon Labs we conducted several data science projects ranging from exploratory to production work. Examining from the human-computer interaction perspective we observed that in the data science practice human-in-the-loop is very much present, in fact I would argue that there are many loops and many humans with different kinds of roles and input into the practice, impacting how machine learning solutions are developed and deployed in practical settings. In this talk I will present some of the patterns we observed in the data science practice and also how human(s)-in-the-loop(s) impacted projects that leveraged traditional machine learning algorithms as well as advanced neural network architectures.
Title: Open Problems in Human-in-the-Loop Machine Learning
Abstract: This talk will feature excerpts from my recently published book "Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI". I'll cover some of the most exciting problems in Human-in-the-Loop Machine Learning and promising recent advances that address some of these problems. The talk will start with one of the most basic and long-standing questions in machine learning: what are the different ways that we can interpret uncertainty in our models? The talk will then discuss recent advances in transfer learning, including active transfer learning for adaptive sampling and the implications of intermediate task transfer learning on the choice of annotation task and annotation workforce(s). Finally, I will talk about advances in annotation quality control and annotation interfaces, including ways to identify annotators with rare but valid subjective interpretations and human-computer interaction strategies for combining machine learning predictions with human annotations.
Session Chair: Eduard Dragut, Temple University
14:50 – 15:00: Break
15:00 – 16:00 Panel: Open challenges in human-computer cooperation in data science
Q. Vera Liao, IBM Research
Eser Kandogan, Megagon
Session Chair: Yunyao Li, IBM Research