Developing conversational agents for use in criminal investigations

SAM HEPENSTAL, Defence Science Technology Laboratory, UK

LEISHI ZHANG, Middlesex University London, UK now at Canterbury Christ Church University, Kent UK.

NEESHA KODAGODA, Middlesex University London, UK

B.L. WILLIAM WONG, Middlesex University London, UK

Year

2021

Journal

ACM Transactions on Interactive Intelligent Systems

Journal citation

11 (3-4), pp. 1-35

Publisher

ACM

ISSN

2160-6455

Digital Object Identifier (DOI)

https://doi.org/10.1145/3444369

The adoption of artificial intelligence (AI) systems in environments that involve high risk and high consequence decision making is severely hampered by critical design issues. These issues include system transparency and brittleness, where transparency relates to (i) the explainability of results and (ii) the ability of a user to inspect and verify system goals and constraints, and brittleness (iii) the ability of a system to adapt to new user demands. Transparency is a particular concern for criminal intelligence analysis, where there are significant ethical and trust issues that arise when algorithmic and system processes are not adequately understood by a user. This prevents adoption of potentially useful technologies in policing environments. In this paper, we present a novel approach to designing a conversational agent (CA) AI system for intelligence analysis that tackles these issues. We discuss the results and implications of three different studies; a Cognitive Task Analysis to understand analyst thinking when retrieving information in an investigation, Emergent Themes Analysis to understand the explanation needs of different system components, and an interactive experiment with a prototype conversational agent. Our prototype conversational agent, named Pan, demonstrates transparency provision and mitigates brittleness by evolving new CA intentions. We encode interactions with the CA with human factors principles for situation recognition and use interactive visual analytics to support analyst reasoning. Our approach enables complex AI systems, such as Pan, to be used in sensitive environments and our research has broader application than the use case discussed.

Also available from https://researchspace.canterbury.ac.uk/8z653/developing-conversational-agents

References

[n.d.]. DeepPavlov: An Open Source Conversational AI Framework. Retrieved on 12th Jan 2021 from http://deeppavlov.ai/.
[n.d.]. Language Understanding (LUIS). Retrieved on 12th Jan 2021 from https://www.luis.ai/home.
[n.d.]. Learn AI-designing and Architecting Intelligent Agents. Retrieved on 12th Jan 2021 from https://azure.github.io/LearnAI-DesigningandArchitectingIntelligentAgents/.
Simon Andrews, Babak Akhgar, Simeon Yates, Alex Stedmon, and Laurence Hirsch. 2014. Using formal concept analysis to detect and monitor organised crime. Lect. Notes Comput. Sci. 8132. DOI:https://doi.org/10.1007/978-3-642-40769-7_11
W. Ross Ashby. 1991. Requisite variety and its implications for the control of complex systems. In Facets of Systems Science. Springer US, Boston, MA, 405–417. DOI:https://doi.org/10.1007/978-1-4899-0718-9_28
Rajeev Bhattacharya, Timothy M. Devinney, and Madan M. Pillutla. 1998. A formal model of trust based on outcomes. Acad. Manag. Rev. 23, 3 (1998), 459–472. Retrieved from http://www.jstor.org/stable/259289.
Ann Blandford and B. L. William Wong. 2004. Situation awareness in emergency medical dispatch. Int. J. Hum.-comput. Stud. 61, 4 (2004), 421–452. DOI:https://doi.org/10.1016/j.ijhcs.2003.12.012
Stuart K. Card, Allen Newell, and Thomas P. Moran. 1983. The Psychology of Human-computer Interaction. L. Erlbaum Associates Inc., Hillsdale, NJ.
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 670–680. Retrieved on 12th Jan 2021 from https://www.aclweb.org/anthology/D17-1070.
Hannah Couchman. 2019. Policing by machine: Predictive policing and the threat to our rights. Liberty. https://www.libertyhumanrights.org.uk/issue/policing-by-machine/.
Neta Ezer, Sylvain Bruni, Yang Cai, Sam Hepenstal, Chris Miller, and Dylan Schmorrow. 2019. Trust engineering for human-AI teams. In Proceedings of the Human Factors and Ergonomics Society Meeting.
Gemma C. Garriga. 2017. Formal Concept Analysis. Springer US, Boston, MA, 522–523. DOI:https://doi.org/10.1007/978-1-4899-7687-1_316
Matylda Gerber, B. L. William Wong, and Neesha Kodagoda. 2016. How analysts think: Intuition, leap of faith and insight. In Proceedings of the Human Factors and Ergonomics Society Meeting 60 (09 2016), 173–177. DOI:https://doi.org/10.1177/1541931213601039
Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning. Retrieved from http://arxiv.org/abs/1806.00069.
Sam Hepenstal, Neesha Kodagoda, Leishi Zhang, Pragya Paudyal, and B. L. William Wong. 2019. Algorithmic transparency of conversational agents. In Proceedings of the Workshop on Algorithmic Transparency in Emerging Technologies co-located with 24th International Conference on Intelligent User Interfaces’19).
Sam Hepenstal, B. L. William Wong, Leishi Zhang, and Neesha Kodagoda. 2019. How analysts think: A preliminary study of human needs and demands for AI- based conversational agents. In Proceedings of the 63rd Human Factors and Ergonomics Society Meeting.
Sam Hepenstal, Leishi Zhang, Neesha Kodagoda, and B. L. William Wong. 2020. Pan: Conversational agent for criminal investigations. In Proceedings of the 25th International Conference on Intelligent User Interfaces. ACM, 134–135. DOI:https://doi.org/10.1145/3379336.3381463
Sam Hepenstal, Leishi Zhang, Neesha Kodagoda, and B. L. William Wong. 2020. Providing a foundation for interpretable autonomous agents through elicitation and modelling of criminal investigation pathways. In Proceedings of the 64th Human Factors and Ergonomics Society Meeting.
Sam Hepenstal, Leishi Zhang, Neesha Kodagoda, and B. L. William Wong. 2020. What are you Thinking? Explaining conversation agent responses for criminal investigations. In Proceedings of the Workshop on Explainable Smart Systems for Algorithmic Transparency in Emerging Technologies co-located with 25th International Conference on Intelligent User Interfaces (IUI’20), Cagliari, Italy, March 17, 2020 (CEUR Workshop Proceedings), Alison Smith-Renner, Styliani Kleanthous, Brian Lim, Tsvi Kuflik, Simone Stumpf, Jahna Otterbacher, Advait Sarkar, Casey Dugan, and Avital Shulner Tal (Eds.), Vol. 2582. CEUR-WS.org. Retrieved from http://ceur-ws.org/Vol-2582/paper3.pdf.
Robert R. Hoffman, Gary Klein, and Shane T. Mueller. 2018. Explaining explanation For “Explainable AI”. Proceedings of the Human Factors and Ergonomics Society Meeting 62, 1 (2018), 197–201. DOI:https://doi.org/10.1177/1541931218621047
Suraiya Jabin. 2015. Machine learning methods and applications using formal concept analysis. Int. J. New Technol. Sci. Eng. 2, 3 (2015).
Bret Kinsella. 2018. Amazon Echo Device Sales Break New Records, Alexa Tops Free App Downloads for iOS and Android, and Alexa Down in Europe on Christmas Morning. Retrieved from https://voicebot.ai/2018/12/26/amazon-echo-device-sales-break-new-records-alexa-tops-free-app-downloads-for-ios-and-android-and-alexa-down-in-europe-on-christmas-morning/.
Bret Kinsella. 2019. NPR Study Says 118 Million Smart Speakers Owned by U.S. Adults. Retrieved from https://voicebot.ai/2019/01/07/npr-study-says-118-million-smart-speakers-owned-by-u-s-adults/.
Gary Klein. 1993. A Recognition .rimed Decision (RPD) model of rapid decision making. Decision Mak. Action: Model. Meth. (01 1993).
Gary Klein. 2017. Seeing What Others Don’t. Nicholas Brearley Publishing.
Gary Klein, Roberta Calderwood, and Donald MacGregor. 1989. Critical decision method for eliciting knowledge. IEEE Trans. Syst., Man, Cyber. 19, 3 (May 1989), 462–472. DOI:https://doi.org/10.1109/21.31053
Gary Klein, Brian Moon, and Robert Hoffman. 2006. Making sense of sensemaking 2: A macrocognitive model. Intell. Syst. 21 (10 2006), 88–92. DOI:https://doi.org/10.1109/MIS.2006.100
Neesha Kodagoda, B. L. William Wong, and Nawaz Khan. 2009. Cognitive task analysis of low and high literacy users: Experiences in using grounded theory and Emergent Themes Analysis. In Proceedings of the Human Factors and Ergonomics Society Meeting 53 (10 2009), 319–323. DOI:https://doi.org/10.1518/107118109X12524441080821
Bongshin Lee, Catherine Plaisant, Cynthia Sims Parr, Jean-Daniel Fekete, and Nathalie Henry. 2006. Task taxonomy for graph visualization. In Proceedings of the AVI Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization (BELIV’06). ACM, New York, NY, 1–5. DOI:https://doi.org/10.1145/1168149.1168168
David Leslie. 2019. Understanding artificial intelligence ethics and safety. CoRR abs/1906.05684 (2019).
Georgios Leventakis and M. R. Haberfeld. 2018. Societal Implications of Community-oriented Policing and Technology. Springer. DOI:https://doi.org/10.1007/978-3-319-89297-9
Zachary Chase Lipton. 2016. The mythos of model interpretability. CoRR abs/1606.03490 (2016).
Bernard Marr. 2014. Dear IKEA: Your customer service is terrible. LinkedIn. Retrieved from www.linkedin.com/pulse/20140325060328-64875646-dear-ikea-your-customer-service-is-terrible.
Michael F. McTear. 2002. Spoken dialogue technology: Enabling the conversational user interface. ACM Comput. Surv. 34, 1 (Mar. 2002), 90–169. DOI:https://doi.org/10.1145/505282.505285
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).
Christoph Molnar. 2019. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable (1st ed.). Lulu.
Donald Norman. 1983. Design rules based on analyses of human error.Commun. ACM 26 (04 1983), 254–258. DOI:https://doi.org/10.1145/2163.358092
Nancy Pennington and Reid Hastie. 1992. Explaining the evidence: Tests of the story model for juror decision making. J. Personal. Soc. Psychol.ogy 62 (02 1992), 189–206. DOI:https://doi.org/10.1037/0022-3514.62.2.189
Alun Preece, William Webberley, David Braines, Erin G. Zaroukian, and Jonathan Z. Bakdash. 2017. Sherlock: Experimental evaluation of a conversational agent for mobile information tasks. IEEE Trans. Hum.-mach. Syst. 47, 6 (Dec. 2017), 1017–1028. DOI:https://doi.org/10.1109/THMS.2017.2700625
Eric Prud’hommeaux and Andy Seaborne. 2007. SPARQL Query Language For RDF. Retrieved from https://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/.
Nadeem Qazi, B. L. William Wong, Neesha Kodagoda, and Rick Adderley. 2016. Associative search through formal concept analysis in criminal intelligence analysis. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. 001917–001922. DOI:https://doi.org/10.1109/SMC.2016.7844519
Nicole M. Radziwill and Morgan C. Benton. 2017. Evaluating quality of chatbots and intelligent conversational agents. CoRR abs/1704.04579 (2017).
Dominik Sacha, Andreas Stoffel, Florian Stoffel, Bum C. Kwon, Geoffrey Ellis, and Daniel A. Keim. 2014. Knowledge generation model for visual analytics. IEEE Trans. Vis. Comput. Graph. 20, 12 (2014), 1604–1613.
Kristin E. Schaefer, Jessie Y. C. Chen, James L. Szalma, and P. A. Hancock. 2016. A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems. Hum. Factors 58, 3 (2016), 377–400. DOI:https://doi.org/10.1177/0018720816634228
Ryan Schuetzler, Mark Grimes, and Justin Giboney. 2019. The effect of conversational agent skill on user behavior during deception. Comput. Hum. Behav.97 (2019), 250–259.
Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? How controllable attributes affect human judgments. CoRR abs/1902.08654 (2019).
Danny Shaw. 2019. Crime solving rates “woefully low,” Met Police Commissioner says. BBC. Retrieved from https://www.bbc.co.uk/news/uk-48780585.
Aaron Springer and Steve Whittaker. 2019. Progressive disclosure: Empirically motivated approaches to designing effective transparency. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). ACM, New York, NY, 107–120. DOI:https://doi.org/10.1145/3301275.3302322
James J. Thomas and Kristin A. Cook.2005. Illuminating the Path: The Research and Development Agenda for Visual Analytics. Pacific Northwest National Laboratory, Richland, WA.
Stephen E. Toulmin. 1958. The Uses of Argument. Cambridge University Press.
Jane Wakefield. 2016. Would you want to talk to a machine?BBC. Retrieved from https://www.bbc.co.uk/news/technology-36225980.
Christine T. Wolf. 2019. Explainability scenarios: Towards scenario-based XAI design. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). ACM, New York, NY, 252–257. DOI:https://doi.org/10.1145/3301275.3302317
B. L. William Wong. 2003. Critical Decision Method Data Analysis. Lawrence Erlbaum Associates, 327–346.
B. L. William Wong and Ann Blandford. 2002. Analysing ambulance dispatcher decision making: Trialing emergent themes analysis. In Proceedings of the HF2002 Human Factors Conference Design for the Whole Person - Integrating Physical, Cognitive and Social Aspects: A Joint Conference of the Ergonomics Society of Australia (ESA) and the Computer Human Interaction Special Interest Group..
B. L. William Wong and Neesha Kodagoda. 2016. How analysts think: Anchoring, laddering and associations. In Proceedings of the Human Factors and Ergonomics Society Meeting 60 (09 2016), 178–182. DOI:https://doi.org/10.1177/1541931213601040
Serhiy A. Yevtushenko. 2000. System of data analysis “Concept Explorer.” (In Russian). In Proceedings of the 7th National Conference on Artificial Intelligence,. 127–134.
Xiaoyu Yin, Dagmar Gromann, and Sebastian Rudolph. 2019. Neural Machine Translating from Natural Language to SPARQL. Retrieved from http://arxiv.org/abs/1906.09302.
Michelle X. Zhou. 2019. Getting virtually personal: Making responsible and empathetic “Her” for everyone. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI’19). ACM, New York, NY, i–i. DOI:https://doi.org/10.1145/3301275.3308445