THOUSANDS OF FREE BLOGGER TEMPLATES

Sunday, April 25, 2010

Context-Free Language

I. Context-Free Grammars

# Formal Definition of Context-Free Grammars

A context-free grammar is a grammar in which the left-hand side of each production rule consists of only a single nonterminal symbol. This restriction is non-trivial; not all languages can be generated by context-free grammars. Those that can are called context-free languages.

The language defined above is not a context-free language, and this can be strictly proven using the pumping lemma for context-free languages, but for example the language \left \{ a^{n}b^{n}  n \ge 1 \right \} (at least 1 a followed by the same number of b's) is context-free, as it can be defined by the grammar G2 with N=\left \{S\right \}, \Sigma=\left \{a,b\right \}, S the start symbol, and the following production rules:

1. S \rightarrow aSb
2. S \rightarrow ab

A context-free language can be recognized in O(n3) time (see Big O notation) by an algorithm such as Earley's algorithm. That is, for every context-free language, a machine can be built that takes a string as input and determines in O(n3) time whether the string is a member of the language, where n is the length of the string.[7] Further, some important subsets of the context-free languages can be recognized in linear time using other algorithms.


# Examples of Context-Free Grammars

For these examples, formal languages are specified using set-builder notation.

Consider the grammar G where N = \left \{S, B\right \}, \Sigma = \left \{a, b, c\right \}, S is the start symbol, and P consists of the following production rules:

1. S \rightarrow aBSc
2. S \rightarrow abc
3. Ba \rightarrow aB
4. Bb \rightarrow bb

Some examples of the derivation of strings in \boldsymbol{L}(G) are:

  • \boldsymbol{S} \Rightarrow_2 \boldsymbol{abc}
  • \boldsymbol{S} \Rightarrow_1 \boldsymbol{aBSc} \Rightarrow_2 aB\boldsymbol{abc}c \Rightarrow_3 a\boldsymbol{aB}bcc \Rightarrow_4 aa\boldsymbol{bb}cc
  • \boldsymbol{S} \Rightarrow_1 \boldsymbol{aBSc} \Rightarrow_1 aB\boldsymbol{aBSc}c \Rightarrow_2 aBaB\boldsymbol{abc}cc \Rightarrow_3 a\boldsymbol{aB}Babccc \Rightarrow_3 aaB\boldsymbol{aB}bccc  \Rightarrow_3 aa\boldsymbol{aB}Bbccc \Rightarrow_4 aaaB\boldsymbol{bb}ccc \Rightarrow_4 aaa\boldsymbol{bb}bccc
(Note on notation: P \Rightarrow_i Q reads "String P generates string Q by means of production i", and the generated part is each time indicated in bold type.)

This grammar defines the language L = \left \{ a^{n}b^{n}c^{n}  n \ge 1 \right \} where an denotes a string of n consecutive a's. Thus, the language is the set of strings that consist of 1 or more a's, followed by the same number of b's, followed by the same number of c's.


# Designing a Context-Free Grammars

Designing Context-Free Grammars.
• Some basic techniques:
– Matching – enforcing the fact that your grammar generates matching pairs of
symbols.
Example: L = {0n12n n ≥ 0}. The context-free grammar for L is
S → 0S11 "
The grammar forces every 0 to match to 11.
What about L = {0n1m 2n ≤ m ≤ 3n} ?
Another example: L = {w w ∈ {0, 1} and of even length }.
S → 0S0 0S1 1S0 1S1 "
The grammar forces every symbol to match to another symbol.
– Using recursive relationships – when you can represent strings in the language in
terms of shorter strings in the language.
Example: L – the language of all strings of balanced brackets. For example,
(()())() ∈ L, but (() /∈ L.
Every string w in L can be viewed as either
- uv, for u ∈ L, v ∈ L, or
- (u), for u ∈ L.
This gives the following grammar:
S → SS (S) "
– Thinking of each nonterminal A as representing a language of all strings w deriv-
able from A. Then, combining these languages into one, using the rules.
Example: L = {0n1m n 6= m}.
Let L1 = {0n1m n > m}, and L2 = {0n1m n <>


# Ambiguity

If a CFG generates the same string in several different ways, we say that the string is derived ambiguously in that grammar. If a CFG generates some string ambiguously we say that the grammar is ambiguous.

Example: the grammar G5, whose rules are:
E −→ E + E E ∗ E (E) a

generate ambiguously some arithmetic expressions.


# Chomsky Normal Form

In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:

A\rightarrow\, BC or
A\rightarrow\, α or
S\rightarrow\, ε

where A, B and C are nonterminal symbols, α is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as (Hopcroft and Ullman, 1979). As pointed out by Lange and Leiß, the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. Using G to denote the size of the original grammar G, the size blow-up in the worst case may range from G 2 to 22 G , depending on the used transformation algorithm (Lange and Leiß, 2009).

Another way to define Chomsky normal form (e.g., Hopcroft and Ullman 1979, and Hopcroft et al. 2006) is:

A formal grammar is in Chomsky normal form if all of its production rules are of the form:

A \rightarrow\, BC or
A \rightarrow\, α

where A, B and C are nonterminal symbols, and α is a terminal symbol. When using this definition, B or C may be the start symbol.




II. Pushdown Automata
# Formal Definition of a Pushdown a Automation

A PDA is formally defined as a 7-tuple:

where

  • is a finite set of states
  • is a finite set which is called the input alphabet
  • is a finite set which is called the stack alphabet
  • is a finite subset of , the transition relation.
  • is the start state
  • is the initial stack symbol
  • is the set of accepting states

An element is a transition of . It has the intended meaning that , in state , with on the input and with as topmost stack symbol, may read , change the state to , pop , replacing it by pushing . The letter (epsilon) denotes the empty string and the component of the transition relation is used to formalize that the PDA can either read a letter from the input, or proceed leaving the input untouched.

In many texts the transition relation is replaced by an (equivalent) formalization, where

  • is the transition function, mapping into finite subsets of .

Here contains all possible actions in state with on the stack, while reading on the input. One writes for the function precisely when for the relation. Note that finite in this definition is essential.


# Examples of Pushdown Automation

The following is the formal description of the PDA which recognizes the language by final state:


PDA for (by final state)

, where

consists of the following six instructions:

, , , , , and .

In words, in state for each symbol read, one is pushed onto the stack. Pushing symbol on top of another is formalized as replacing top by . In state for each symbol read one is popped. At any moment the automaton may move from state to state , while it may move from state to accepting state only when the stack consists of a single .

There seems to be no generally used representation for PDA. Here we have depicted the instruction by an edge from state to state labelled by (read ; replace by ).


# Equivalence with Context-Free Grammars

CFG and PDA are equivalent in power: a CFG generates a context-free language and a PDA recognizes a context-free language.We show here how to convert a CFG into a PDA that recognizes the language specified by the CFG and vice versa Application: this equivalence allows a CFG to be used to specify a programming language and the equivalent PDA to be used to implement its compiler.

0 comments: