Category Archives: work in progress

Corporate Genres…

Next week, I’ going to teach a seminar on English on the web at Potsdam University. I was invited by Prof. Dr. Barth-Weingarten because I had taught a seminar on “English in the New Media” there in 2014. The current seminar’s title is “English@Work” and it focusses on the use of English in professional settings. So, even though I’ve worked on web-based English a bit already, this seminar session will be quite a challenge – it’s “undiscover’d country” for me.

I’ve been working hard the last weeks to get ideas and a feasable plan for the seminar. Luckily enough, Jana Pflaeging allowed me to pick up on the structure we used for our seminar at Zagreb University, where we compared the two genres “ListSite” and “Personal Weblog” in group work. My idea for Potsdam now is to do the same with corporate websites and corporate blogs (mainly drawing on Poppi 2012 for the former and Puschmann 2010 for the latter).

I think I’ll pursue the question of how the challenge of creating a favourable corporate image for so many different recipients on the web is tackled on corporate websites and corporate blogs. I’ll show that corporate blogs address the need of companies to present themselves as interactive and accessible and, thanks to Puschmann’s previous work, we can also deal with a blog that only at a second glance turns out to be a corporate one (i.e. pretends to be something else) – whioch will be highly interesting 🙂 We will, therefore, compare some language features, participation frameworks, topics and functions – each aspect to be worked on by one group of students. Still, I am very excited and hope that the session turns out to be a good one…

In two weeks time, I’ll use the food for thought of this seminar session in a talk on corporate genres at Halle University – so I’m really looking forward to the new input I’ll get from the Potsdam students! 🙂

Book Published :-)

Peter Lang Personal Weblog CoverLast week, I received a parcel by the Peter Lang Verlag. Unfortunately, I was too excited to actually create an “unboxing video” (as described by Klaus Kerschensteiner in the first issue of  10plus1: Living Linguistics)… It contained the monograph The Personal Weblog: A Linguistic History that I had worked on whenever I had time in 2015.

It grew into more than a mere translation of my PhD-thesis – effectively, I wrote the book anew. And enjoyed it, as I had the feeling that I could write more freely after the content and the ideas had had some time to settle. The result is, at least I hope so, a readable monograph that is much shorter than my PhD thesis and that contains also a number of new ideas that hadn’t been developed at the time I wrote the PhD yet (e.g. actually mapping the prototypical distribution of features in diagrams that are based on statistics and, in fact, very much resemble Lemke’s (1999) theoretical sketches).

The book is now out for criticism and discussion – and I’m looking forward to both 🙂

I am very thankful to so many people who have accompanied me on the way to this book. Therefore, I’d like to reproduce the acknowledgements here:

Peter Lang acknowledgements

Nightbus to Zagreb (with Jana Pflaeging)

In a few days, I’ll teach a seminar session on viral and non-viral genres with my colleague and friend Jana Pflaeging at Zagreb University. We’ll compare the genres “ListSite” (Jana published on it in 10plus1) and “Personal Weblog” on several layers. I’m very much looking forward to it – even though it’ll be quite stressful. We’ll take the nightbus from Munich on Monday, teach the seminar on Tuesday morning, and then return at 11pm to Munich by nightbus 🙂

Current Book Project “The Personal Weblog: A Linguistic History” and Peter Lang Nachwuchspreis

After I finished my PhD-thesis “Textsorten im Internet zwischen Wandel und Konstanz: Eine diachrone Untersuchung der Textsorte Personal Weblog” in June 2014, I immediately published it as open access version. For various reasons, I wrote my PhD in German. One of my first thoughts after publishing it was to turn it into a “proper” book – in English, this time (by “proper book” I mean, for instance, reducing the length from 450 to roughly 200 pages and cutting away typical dissertation rhetorics, orienting not towards examiners but an interested semi-expert audience). The working title is “The Personal Weblog: A Linguistic History”. So far, I have finished a chapter on genre theory (including genre change) and one of the two concluding chapters.

An additional motivation for continuing with this project comes from Peter Lang Verlag: My PhD and the planned English book based on it were awarded the Peter Lang Nachwuchspreis, which includes the coverage of all publication costs for a print and an ebook edition. The award came quite as a surprise but I am really grateful for the opportunity and the additional motivation it offers for my book project!

Feedback on Chapter 8: Textual Function

Last week, I my mentor gave me detailed feedback on chapter 8 of my thesis. All in all, the feedback was quite positive. The most important points to work on include:

  • The chapter could do with some restructuring. Unti the feedback session, I did not realise that some aspects that are discussed in point 8.3.1. of the chapter (functions as ethnocategories and their prototypicity) could tightly be linked to the first point, where some theoretical considerations on textual functions and a methodology for their analysis are offered together with a review of research on Weblogs’ functions (see Table of Contents May 2013).
  • In general, we started thinking about restructuring the thesis in order to increase readability. Chapter 2, that was actually intended to develop a genre model in detail, including discussions of the individual layers, might probably just serve as a rough sketch of the genre concept and its socio-cognitive components as well as the layers; the detailed discussions should be postponed until the first part of each analytical chapter. I hope that works out… I also started thinking about how I could shorten the thesis and be more concise in the end.
  • The ethnocategories should be discussed in more detail especially concerning the question whether they are really functionally determined or rather bundles of features containing a whole lot of structure as well. I tend to assume the latter, but should strengthen this aspect, especially because my chapter contains a whole lot of structural analysis as the basis of arguing for functional ascriptions.
  • Maybe some part of the structural analysis become redundant when chapter 7 is developed; then I could shorten the analyses a little bit.
  • My “entertainment” function is analysed as an appellative function urging readers to view a text as entertaining. That is, actually, meta-communicative and therefore situated on another layer than all the other functions… I should therefore treat the entertainment aspect in a different sub-chapter (and not next to advertisments, for instance).
  • Other functions, such as teasing or boasting could also be adressed in this rather small sub-chapter..

All in all, I start realising that I am reaching a point where I have to start thinking about the whole of the thesis again… so I am entering some phase of transition into the last stage of the thesis already. I hope I can manage the amount of work still before me till the end of the year…

Chapter 8: Multimodal Structure

After handing in chapter 7 (Textual Function) and while waiting for feedback on those nearly 80 pages, I started working on the chapter on multimodal structure. This chapter is basically a core linguistic one and should contain analyses on the following aspects:

  • Macro Structure:
    • layout of blog pages (header, sidebars, body etc.)
    • blog pages as part of a network of pages (about, pictures, homepage etc etc)
  • Meso Structure (is that a proper term?):
    • key elements of blog postings (meta links, tags, categories)
    • key elements of sidebars (meta links, blogrolls etc etc.)
    • thematic structure of blog postings (?)
  • Micro Structure:
    • language and image
    • register / style: key words, frequency counts, sentence and word length…
    • hyperlinks and their uses
    • topics, subtopics, topical coherence

As always, I did have a rough idea about what the chapter should deal with, but I did not know how to gather the necessary data. I did quite extensive research on corpus software, comparing the abilities of particular programs, always asking myself whether I could need what was offered.

I came across the program TreeTagger by Helmut Schmid (described in detail in Schmid 1994). This software can be used on .txt files and creates a vertical .txt file (one token per line) with a POS tag added to each token the tagger knows. Installing the program on windows is not easy for dummies as it was actually designed to run on LINUX and still needs the command shell. There is, however, also a graphical interface, which I tried out (of course) und which works quite well.

TreeTagger serves as POS Tagger only. M.Eik Michalke provides a software package – koRpus – which works within the R-framework. The koRpus package can tag .txt files using TreeTagger and afterwards do some frequency analyses on the text in question. As it is written by a psychologist, its focus lies on readability measures. As my knowledge of R is quite limited and (after everything had looked really promising for a while) became disappointed with the measures available (and especially with the way the data generated by the analyses is stored and made available for further use – I was not able to really figure that out, not even using the graphical R-interface RKWard), I decided not to use koRpus and look further for other software.

And I found: WordSmith, a software that offers the following (unfortunately, not on an open-source base as the R packages and therefore not for free…):

  • word lists, frequency analyses and measures such as sentence- and word length
  • key words in texts or groups of texts based on the word lists of single texts, key words can be compared with established corpora such as the BNC
  • concordances (even though I probably will not need those)

I was especially thrilled by the key word feature, as this makes possible to identifiy key topics when the 10 to 20 most frequent nouns in a text (or all texts of a period) are understood as indicators to the topics mostly dealt with. An example: I did a key word analysis on two texts of period one and found IT-words among the most frequent nouns. This was what I expected as the first weblog authors were mainly IT experts and their weblogs dealt (among other, more personal topics as in EatonWeb, for instance) with IT stuff, software, new links on the web, Apple vs. Microsoft and so on and so forth. I now hope to use this key word-tool for a broader analyses, aiming at extrapolating topical shifts across the periods.

So, currently I am working myself through all corpus texts again (330), doing the following steps (as always, I use SPSS for my statistics):

  1. I count the hyperlinks used in the entries. I differentiate between external links (the URL points to another domain), internal links (the URL remains within the same domain, links to categories, e.g. are internal as well), meta links (Permalinks, Trackbacks and Comment links, mostly at the end of postings; categories do not belong here and are counted as internal links as some period I weblogs already offer internal category links, but no other meta links. I also want to get neat data for the categories) and other links (mail:to, download etc.)
  2. I count other meso-structural features such as BlogRolls, guest books and so on. Maybe there are some trends that show after some counting…
  3. I determine a layout-type – Schlobinski & Siever (2005) suggested some and I extended their typology.
  4. I code the text in MAXQDA for special features like emoticons, rebus forms, oral features, graphostyle…
  5. I generate a pdf-file from the website which is imported to MAXQDA as well. This pdf-file is used for coding the language-image interplay and image types. Currently, I am doing some rough coding, intending to get more fine grained later on.
  6. I generate a .txt file with the postings of the weblog. This .txt file will later be used in WordSmith.

This procedure takes a while. As it is quite exhausting as well, I can only analyse around 20 texts per day. So that means around 6 weeks of work until I can move on to the WordSmith analyses and the language-image interplay (I’m really dreading that…).

Corpus Update

As I have pointed out in my first post, one comment about the diachronic corpus of Personal Weblogs my thesis is based on concerned the number of texts especially in the later periods (An outline of the corpus structure can be found in the talks “Anhything goes – everything done?” and “Stability, Diversity, and Change. The Textual Functions of Personal Weblogs”) People argued that a low number of texts was fine for period one, as there were only few weblogs around in these days. However, higher numbers of texts were expected for later periods as the access grew easier with more recent collection dates.

I have been thinking about these comments ever since, trying to find arguments for not extending the corpus. What I found, however, were quite weak excuses. Even more, I started wondering how I could justify a particular number of texts for a period in question at all. I came up with the following line of reasoning:

  • I work with both qualitative and quantitative methods, even though my general focus lies on the qualitative end of the continuum. Text numbers, therefore, have to be justified both from a qualitative and a quantitative point of view.
  • The qualitative framework of my thesis is heavily inspired by Grounded Theory (eg. following Glaser & Holton 2004). In Grounded Theory, there is a process called “Theoretical Sampling” combining data collection, coding and analysis. The basic idea is that data collection is guided by the emerging theory and strives for theoretical saturation. In other words: If nothing new is found, no conflicting cases, no cases challenging the categories established so far, the analyst has reached some point close enough to theoretical saturation to stop collecting samples. (footnote: He might as well have turned blind to new phenomena by excessive preceeding analysis. Anyway, further collection of samples would not help the research project in that case, either.) So that’s exactly my qualitative part of the argumentation: Collecting text samples until nothing new or challenging is discovered. This point had already almost been reached after collecting and analysing 80 to 90 texts for the periods II.A to II.C, but it was good to put my categories to the test by collecting more texts and assimilating them into my theory.
  • From a quantitative point of view, a researcher has to make some kind of informed guess on how many cases will probably be enough to make some statistically sound statements. One formula suggested by Raithel (2008: 62) uses the number of variables to be joint in one analytical step (e.g. a correlation study of two variables) and associated features (e.g. two features for the variable “gender”) ; this value is multiplied by 10: n >= 10 * K^V As I try to trace the change within several variables which are investigated apart from each other, my analytical steps quite often only contain one variable with a particular number of features. The variable with the highest number of features at present is the textual function with about ten distinct features (e.g. Update, Filter, Sharing Experience as outlined in my last post. Consequently, about 100 texts per period are roughly enough according to this formula. This is quite a tight budget; if I want to correlate the variable “textual function” with the variable “gender of author” I have to point out that the results give some hint at a possible statistical connection but have to be taken with a pinch of salt.

I think that both arguments taken together form a fairly stable basis for the justification of the number of cases. I guess 100 texts in the periods II.A, II.B and II.C are also a good compromise between striving for ever higher case numbers and the feasability of qualitatively and thoroughly analysing, say, 500 texts in each period.

So, after the extension phase that took me a bit more than one week of searching for texts, coding, basically repeating all analytical steps I had done before and updating the numbers in my thesis, the corpus looks like that now (snapshot from my screen, sorry for the quality):

DIABLOK