  1. Nov 2018
    1. The formula for calculating normalised frequency is: (observed frequency of your search term x basis of normalisation e.g. 1 million) / Total corpus size.

      Formula for normalized frequency

    2. So here we come to know that all linguistic features are related with culture.

    1. Annotation can be defined as “The process of adding […] interpretive, linguistic information to an electronic corpus of spoken and/or written language data” (Leech 1997). So annotation involves adding interpretive linguistic information to a text (e.g. part-of-speech). Markup provides non-linguistic objective, verifiable information (e.g. author, paragraph boundary). Tagging is a process of using specific conventions (tags) to a text for annotation/markup purposes (e.g. XML tags). The key distinction between annotation/markup is the type of information they add to a text. Tagging is then a method of annotation/markup.

      And there goes three of them seen, identified and unambiguous: Annotation-interpretative, markup-metadata, and Tags-conventions.

    1. (https://corpus.byu.edu/coca/). You can access the BNC free of charge through CQPweb (https://cqpweb.lancs.ac.uk/).

      corpus English

    2. (https://www.birmingham.ac.uk/schools/edacs/departments/englishlanguage/research/projects/clic/index.aspx)

      now comes my thing. literary corpus of linguistics.

    3. There's the BAWE corpus (about 6 million words) and the Cambridge Academic English corpus (CAEC) (almost 4 million words). I am not sure whether those corpora are small enough for your research purposes. You can have a look at the list of academic corpora available through Sketch Engine (https://www.sketchengine.eu/user-guide/user-manual/corpora/corpora-list/).


    4. if I am comparing a corpus of the work of Robert Burns with a general corpus of Late Modern English, does it make sense to remove Burns texts from the general corpus?

      extracting and analyzing specialized from and against General

    1. Markup is the way of adding annotation to the text. Tags are a part of the annotation itself. If you are applying a standard tagset (e.g. POS tags, USAS tags)

      Tags are part of markup and used to annotate the text/corpus.

    2. corpora are collections of natural language that are stored in electronic format. They provide evidence to help us identify patterns, trends and changes in language use that we might otherwise not be able to identify

      wonderful use of corpora to develop:linguistic theory, test and validate hypothesis. For the last two, all that we need is data and corpus means data (enormous).

    3. I would argue that the difference between annotation (tagging) and markup, is that markup involves adding information (either linguistic or non-linguistic) to a text, whilst linguistic annotation explicitly involves adding linguistic information (and in this sense, might be viewed as a sub-category of markup as an umbrella term).

      So, here's the trouble: markup and annotations are two different guy.