Projects

During my time at Lancaster University I worked on several multidisciplinary projects. Feel free to contact me with any questions.

show all

Funded Projects at Lancaster University

FreeTxt: supporting bilingual free-text survey and questionnaire data analysis - March 2022 to March 2023

Title: FreeTxt: supporting bilingual free-text survey and questionnaire data analysis.
Start date: 1 March 2022. End date: 1 March 2023. Funder: GOV.WALES. Amount: $106k.
URL:
Abstract:This project aims to bridge this gap by building the novel 'FreeTxt' toolkit which is designed to support the analysis and visualisation of multiple forms of open-ended, free-text data in both English and Welsh. FreeTxt will draw on existing open-source bilingual corpus-based utilities and methodologies, repackaging these and taking them in a new direction so that they are relevant to new audiences/user-groups. We will work closely with project partners Cadw and National Trust Wales to co-design, co-construct and test FreeTxt to ensure that the resource is fit-for-purpose and fairly and consistently meets the needs of Welsh and English-language responses.

Welsh Automatic Text Summarisation - May 2021 to May 2022

Title: Gymraeg: ADNODD CREU CRYNODEBAU (ACC). English: Welsh Summary Creator (WSC).
Start date: 1 May 2021. End date: 1 May 2022. Funder: GOV.WALES. Amount: $127k.
URL: ADNODD CREU CRYNODEBAU (ACC)
Abstract:The Welsh summarisation tool will contribute to the automated tools available in the Welsh language and facilitate the work of those involved in document preparation, proof-reading, and (in certain circumstances) translation. The tool will also allow professionals to quickly summarise long documents for efficient presentation. For instance, the tool will allow educators to adapt long documents for use in the classroom. It is also envisaged that the tool will benefit the wider public, who may prefer to read a summary of complex information presented on the Internet or who may have difficulties reading translated versions of information on websites. Uniting a collaborative team with expertise in multilingual text summarisation (El-Haj, Lancaster University), Welsh language (Morris, Cardiff University) and corpus and language tools development (Knight, Cardiff University), the main objective of this project is to develop publicly available text summarisation tools for Welsh.

IFRS 15 - NLP and Text Analysis - April 2020 to April 2022

Title: An Assessment of Corporate Disclosures from Accounting Standards 15: Revenue from Contracts with Customers.
Start date: April 2020. End date: July 2022. Funder: IAAER/KPMG. Amount: $25k.

Abstract:The subjective and qualitative nature of most disclosures in a setting of information asymmetry makes it difficult to assess them with validity. Yet researchers have examined disclosures in many countries, with several accounting standards. Hellman et al. (2018) provide an excellent summary of this work. Although a self-constructed index seems to be the most common method of measuring disclosures, Beattie et al. (2004) describe alternative methods that have good potential. We apply four of these methods: (1) professional evaluation of a small sample, (2) thematic content analysis, (3) readability analysis, and (4) semantic analysis, for a multi-pronged approach. To connect with the literature in this area, we then corroborate our disclosure measures to those that would be obtained with a disclosure index, as well comparing our highest ranked firms to annual report award winners. Our research is among the first to explore disclosure on IFRS 15, and first to apply a comprehensive set of methods.

Arabic USAS Semantic Tagger - June 2020 to August 2021

Title: GArabic USAS Semantic Tagger (AraSAS).
Start date: 1 June 2020. End date: 1 August 2022. Funder: Research Incentive Fund grant R19068 from Zayed University Office of Research. Amount: $7k.
URL: AraSAS
Abstract:A collaboration Zayed University and New York Abu-Dhabi University, the work providesthe first Arabic Semantic Tagger, which we called AraSAS https://arasas.herokuapp.com/. The Project was funded by Zayed University and supervised by myself and Professor Paul Rayson at Lancaster University.

Detect Risk Warning through analysing UK Conference Calls - January 2015 to March 2015 - Finished

Title: Detect Risk Warning through analysing UK Conference Calls.
Start date: 1 January 2015. End date: 1 March 2015. Funder: Hedge Fund Company in London. Amount: $7k.

Abstract:The funded from one of Europe's leading investment companies was to develop a financial analysing tool to detect risk warning by automatically analysing the language of UK and US financial reports and conference calls. I have successfully accomplished the task and provided training and support for the company users.

Analysing Narrative of UK PEAs and Annual Reports - September 2019 to July 2020 - Finished

SenseSourcing - December 2014 to July 2015 - Finished

Title: Support for the ‘Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators.
Start date: 1 December 2014. End date: 1 July 2015. Funder: UCREL Lancaster University. Amount: $7k.
URL: SenseSourcing
Abstract:Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel task specific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.

CFIE Financial Text Analysis using NLP - December 2012 to January 2020 - Finished

Title: The Corporate Financial Information Environment.
Start date: December 2012. End date: January 2020. Funder: ESRC. Amount: £424,390.
URL: CFIE Project
Abstract:The quality of information provided to investors by corporate management in publicly traded companies is a matter of central importance to financial market participants. Narrative commentaries represent an increasingly significant component of financial communications. While financial narratives in the UK are shaped in part by prevailing regulations, senior management enjoys significant discretion over the content, structure and presentation of these disclosures. The informativeness of financial narrative disclosures and the way management apply their reporting discretion are key questions for academics and policymakers. Partnering with the UK body responsible for promoting high quality corporate governance and financial reporting - the Financial Reporting Council (FRC) - this interdisciplinary project will combine expertise from accounting with state-of-the-art methods from computational linguistics to examine two key elements of financial disclosure. The first aspect is preliminary earnings announcements (PEAs), which arguably represent the most important disclosure in UK firms' annual reporting calendar. The second aspect is the annual report to shareholders, which forms the largest single recurring disclosure commitment for management. Two opposing perspectives exist on corporate narrative disclosures. On the one hand, proponents argue that narratives provide information beyond that contained in financial data. On the other hand, opponents claim that management exploit the discretion embedded in narrative reporting to obfuscate or present a biased representation of actual performance. While extant work on UK annual report and PEA narrative disclosures provides evidence consistent with both perspectives, both the scope of the research and the generalizeability of findings is compromised because conclusions rely on manual coding methods applied to small samples. This project will develop and use state-of-the-art computerized textual analysis methods to study the properties and usefulness of financial narratives for a comprehensive sample of UK disclosures published between 2003 and 2016. While researchers are already using these methods to study disclosures made by US companies, problems accessing digital PEAs and annual reports coupled with inconsistent document structure has hindered computerized analysis of UK financial narratives and skewed research agendas away from studying UK reporting outcomes. This project will shine much needed light on two key aspects of UK narrative reporting. The work will provide the first large sample analysis of PEAs narratives. The project will also examine a set of contemporary policy-relevant themes relating to the content and structure of UK annual reports. Software tools and datasets from the project will also create new opportunities for the research community. Policymakers are facing pressure to adopt evidenced-based approaches to regulation. While the FRC is committed to conducting impact and evaluation analyses, it is reliant on a relatively small team of research staff to undertake such work, much of which involves manual collection and analysis of unstructured data. The labour-intensive nature of the work inevitably yields results that are hard to generalize and constrains the scope of the FRC's work. As well as examining novel and policy-relevant research questions, this project will embed computerized text analytics methods in the FRC's formal policymaking processes. The methods will complement existing approaches by facilitating lower cost and more comprehensive assessments of regulatory changes and emerging issues in narrative reporting.

Biomedical Text Mining - August 2016 to March 2019 - Finished

Title: The Corporate Financial Information Environment.
Start date: August 2016. End date: March 2019. Funder: Welcome Trust. Amount: £200,000.
URL: BioTM Project
Abstract:This Biomedical Text Mining (BTM) website is used to record information about interdisciplinary research activities at Lancaster University related to explorations of biomedical literature and data using Natural Language Processing and Corpus Linguistics methodologies.