ASIC's proof-of-concept study (PDF)—which was run in January and February, written up in March, and published in response to a Senate inquiry in May

Australian government trial finds AI is much worse than humans at summarizing

submited by
Style Pass
2024-09-05 09:00:03

ASIC's proof-of-concept study (PDF)—which was run in January and February, written up in March, and published in response to a Senate inquiry in May—has a number of limitations that make it hard to generalize about the summarizing capabilities of state-of-the-art LLMs in the present day. Still, the government study shows many of the potential pitfalls large organizations should consider before simply inserting LLM outputs into existing workflows.

For its study, ASIC teamed up with Amazon Web Services to evaluate LLMs' ability to summarize "a sample of public submissions made to an external Parliamentary Joint Committee inquiry, looking into audit and consultancy firms." For ASIC's purposes, a good summary of one of these submissions would highlight any mention of ASIC, any recommendations for avoiding conflicts of interest, and any calls for more regulation, all with references to page numbers and "brief context" for explanation.

ASIC used five "business representatives" to evaluate the LLM's summaries of five submitted documents against summaries prepared by a subject matter expert (the evaluators were not aware of the source of each summary). The AI summaries were judged significantly weaker across all five metrics used by the evaluators, including coherency/consistency, length, and focus on ASIC references. Across the five documents, the AI summaries scored an average total of seven points (on ASIC's five-category, 15-point scale), compared to 12.2 points for the human summaries.

Leave a Comment