|
Contact Details |
|
| Phone: | +61 (07) 3864 1944 |
| Fax: | +61 (07) 3864 1801 |
| Address: | 2 George St GPO Box 2434 Brisbane QLD 4001 Australia |
| Room: | GP S642 |
I have been involved with teaching at QUT since Semester 1,
2000. I began as a tutor for Software Development I (ITB410)
and Foundations of Computing (ITB106).
I have taught Technology of Information Systems (ITD412)
and Software Development I (ITD410)
at QUT's
|
Year |
Semester |
Unit |
Role |
| 2003 | 2 | ITB421 – Software Development 3 ITB432 – Advanced Programming Laboratory ITN432 – Advanced Programming Laboratory |
Tutor Co-ordinator, Lecturer, Tutor Co-ordinator, Lecturer, Tutor |
|
2003 |
1 |
ITB111 – Software Development 1 ITB421 – Software Development 3 ITB432 – Advanced Programming Laboratory ITN432 – Advanced Programming Laboratory |
Tutor Tutor Co-ordinator, Lecturer, Tutor Co-ordinator, Lecturer, Tutor |
| 2002 | 2 | ITB410 – Software Development 1
ITB411 – Software Development 2 ITB421 – Software Development 3 ITB432 – Advanced Programming Laboratory |
Moderator
Tutor Tutor, Duty Tutor Co-ordinator, Lecturer, Tutor |
|
2002 |
1 |
ITB410 – Software Development 1 ITB432 – Advanced Programming Laboratory |
Lecturer, Tutor Tutor |
|
2001 |
2 |
ITB410 – Software Development 1 ITB411 – Software Development 2 ITB420 – Computer Architecture ITB432 – Advanced Programming Laboratory ITD410 – Software Development 1* |
Tutor Tutor Tutor Tutor Co-ordinator, Lecturer, Tutor |
|
2001 |
1 |
ITB106 – Foundations of Computing ITB432 – Advanced Programming Laboratory ITD412 – Technology of Information Systems* |
Tutor Tutor Co-ordinator, Lecturer, Tutor |
|
2000 |
Summer |
ITD412 – Technology of Information Systems* |
Co-ordinator, Lecturer, Tutor |
|
2000 |
2 |
ITD412 – Technology of Information Systems* |
Co-ordinator, Lecturer, Tutor |
|
2000 |
1 |
ITB106 – Foundations of Computing ITB410 – Software Development 1 |
Tutor Tutor |
*
at
|
Semester 1, 2004 |
||
|
Day |
Time |
Location |
|
Tuesday |
3 pm |
5 pm |
|
Wednesday |
1 pm |
3 pm |
I completed my Master of Information Technology (Research) thesis in March 2003, entitled “Analysing E-mail Text Authorship for Forensic Purposes”. This research was undertaken in the Information Security Research Centre (ISRC), Faculty of Information Technology (FIT) at Queensland University of Technology. My supervisors for this project were Dr Alison Anderson and Associate Professor George Mohay.
In this work attempted to determine if the authorship of an e-mail message can be identified from the text within the e-mail message. I used a machine learning approach to build a model of an author's style. When a series of models are learnt, unidentified or anonymous e-mail messages can be compared to the models to classify the authorship of the message. Results were quite satisfactory, with better than 85% accuracy of identifying the authorship of e-mail messages from authors for whom a model exists.
As an adjunct to this work, I attempted to determine if there are certain features which can be used to distinguish between different cohorts of authors. For example, it would be useful as a filtering tool for forensic purposes to determine if an e-mail message was written by a male of female (i.e. gender cohorts) or by a person who has English as a first or second language (EFL vs ESL). EFL vs ESL gave better results than male vs female comparisons.
The Machine Learning tool used for this body of research is the Support Vector Machine. The SVMlight implementation prepared by Thorsten Joachims has been used.
·
FIT
Collaborative Development Grant – 2002
“Hidden Markov Models for Authorship and Cohort Analysis”, Malcolm Corney, Ross Hayward, Jim Hogan, Alison Anderson
This work addresses the important computer forensics problems of author attribution and cohort analysis through the application of probabilistic Hidden Markov Models, a technique which has a long history of proven success on related problems in bioinformatics and natural language processing. Our approach will be based upon the hierarchical combination of such models – informed at the macroscale by document conventions and at the microscale by phrase structure grammars – facilitating robust learning and inference across a variety of domains. Unlike many preceding approaches, our work is couched within a well developed probabilistic framework, allowing precise estimates of the quality of our predictions and a clear methodology for model improvement.
·
QUT Small
Research Grants Scheme – 2003
“Probabilistic Models for Authorship and Cohort Analysis”, Jim Hogan, Ross Hayward, Malcolm Corney
Identification of the person responsible for a text document is a problem of great forensic importance, and one of wider significance since the advent of the internet and e-mail. This work addresses this problem by scoring unknown text with respect to a set of author (cohort)-specific local probability models, with authorship (cohort membership) determined through a weighted combination of local scores. This framework allows formal probability estimates to be attached to predictions, avoiding the pitfalls of traditional linguistic methods and greatly limiting fasle identifications. Our approach is a novel application of Hidden Markov Models and Sentence Mixture Models, which have a history of success in language processing applications.
· M. Corney, “Analysing E-mail Text Authorship for Forensic Purposes”, 181 pages, 2003.
Abstract
E-mail has become the most popular Internet application and with its rise in use has come an inevitable increase in the use of e-mail for criminal purposes. It is possible for an e-mail message to be sent anonymously or through spoofed servers. Computer forensics analysts need a tool that can be used to identify the author of such e-mail messages.
This thesis describes the development of such a tool using techniques from the fields of stylometry and machine learning. An author's style can be reduced to a pattern by making measurements of various stylometric features from the text. E-mail messages also contain macro-structural features that can be measured. These features together can be used with the Support Vector Machine learning algorithm to classify or attribute authorship of e-mail messages to an author providing a suitable sample of messages is available for comparison.
In an investigation, the set of authors may need to be reduced from an initial large list of possible suspects. This research has trialled authorship characterisation based on sociolinguistic cohorts, such as gender and language background, as a technique for profiling the anonymous message so that the suspect list can be reduced.
· M. Corney, A. Anderson, G. Mohay & O. de Vel, "Identifying the Authors of Suspect Email", accepted by Computers and Security Journal, 2002.
[abstract][Not available online]
Abstract
In this paper, we present the results of an investigation into identifying the authorship of email messages by analysis of the contents and style of the email messages themselves. A set of stylistic features applicable to text in general and an extended set of email-specific structural features were identified. A Support Vector Machine learning method was used to discriminate between the authorship classes. Through a series of baseline experiments on non-email data, it was found that approximately 20 email messages with approximately 100 words in each message should be sufficient to discriminate authorship in most cases. These results were confirmed with a corpus of email data and performance was further enhanced when a set of email-specific features were added. This outcome has important implications in the management of such problems as email abuse, anonymous email messages and computer forensics.
Abstract
We describe an investigation into e-mail content mining for author identification, or authorship attribution, for the purpose of forensic investigation. We focus our discussion on the ability to discriminate between authors for the case of both aggregated e-mail topics as well as across different e-mail topics. An extended set of e-mail document features including structural characteristics and linguistic patterns were derived and, together with a Support Vector Machine learning algorithm, were used for mining the e-mail content. Experiments using a number of e-mail documents generated by different authors on a set of topics gave promising results for both aggregated and multi-topic author categorisation.
O. de Vel, A. Anderson, M. Corney and G. Mohay, "Email Authorship Attribution for Computer Forensics", in Daniel Barbara, Sushil Jajodia, “Applications of Data Mining in Computer Security”, ISBN 1-4020-7054-3, Kluwer Academic Publishers, Boston, 2002, 252 pages.
[abstract][Book URL][Chapter not available online]
Abstract
In
this chapter, we briefly overview the relatively new discipline of computer forensics
and describe an investigation of forensic authorship attribution or
identification undertaken on a corpus of multi-author and multi-topic e-mail
documents. We use an extended set of e-mail document features such as
structural characteristics and linguistic patterns together with a Support
Vector Machine as the learning algorithm. Experiments on a number of e-mail
documents generated by different authors on a set of topics gave promising
results for both inter- and intra-topic author categorisation.
· M. Corney, O. de Vel, A. Anderson & G. Mohay, "Gender-Preferential Text Mining of E-mail Discourse", accepted for 18th Annual Computer Security Applications Conference (2002 ACSAC) December 9 – 14, 2002, Las Vegas, Nevada, USA.
[abstract][pdf]
Abstract
This paper
describes an investigation of authorship gender attribution mining from e-mail
text documents. We used an extended set of predominantly topic content-free
e-mail document features such as style markers, structural characteristics and
gender-preferential language features together with a Support Vector Machine
learning algorithm. Experiments using a
corpus of e-mail documents generated by a large number of authors of both
genders gave promising results for author gender categorisation.
·
O. de Vel, M. Corney, A. Anderson and G.Mohay, “Language and Gender Author Cohort Analysis
of E-mail for Computer Forensics”, Digital
Forensic Research Workshop,
[abstract][pdf]
Abstract
We describe an investigation of authorship gender and language background cohort attribution mining from e-mail text documents. We used an extended set of predominantly topic content-free e-mail document features such as style markers, structural characteristics and gender-preferential language features together with a Support Vector Machine learning algorithm. Experiments using a corpus of e-mail documents generated by a large number of authors of both genders gave promising results for both author gender and language background cohort categorisation.
O. de Vel, A. Anderson, M. Corney & G. Mohay, "Multi-Topic E-mail Authorship Attribution Forensics", ACM Conference on Computer Security - Workshop on Data Mining for Security Applications, November 8, 2001, Philadelphia, PA, USA.
[abstract][pdf]
Abstract
In this paper we describe an investigation of forensic authorship identification or categorisation undertaken on multi-topic e-mail documents. We use an extended set of e-mail document features such as structural characteristics and linguistic patterns together with a Support Vector Machine learning algorithm. Experiments on a number of e-mail documents generated by different authors on a set of topics gave promising results for both inter- and intra-topic author categorisation.
This page is maintained by Malcolm
Corney and was last updated on 4th June, 2004.
For comments or queries about the material on this page email Malcolm Corney
at m.corney@qut.edu.au