do your search here :) :-

Google
 

Thursday, January 24, 2008

Parsing and processing C++ source code

It is relatively difficult to write a good C++ parser with classic parsing algorithms such as LALR(1).[5] This is partly because the C++ grammar is not LALR. Because of this, there are very few tools for analyzing or performing non-trivial transformations (e.g., refactoring) of existing code. One way to handle this difficulty is to choose a different syntax, such as Significantly Prettier and Easier C++ Syntax, which is LALR(1) parsable. More powerful parsers, such as GLR parsers, can be substantially simpler (though slower).
Parsing (in the literal sense of producing a syntax tree) is not the most difficult problem in building a C++ processing tool. Such tools must also have the same understanding of the meaning of the identifiers in the program as a compiler might have. Practical systems for processing C++ must then not only parse the source text, but be able to resolve for each identifier precisely which definition applies (e.g. they must correctly handle C++'s complex scoping rules) and what its type is, as well as the types of larger expressions.
Finally, a practical C++ processing tool must be able to handle the variety of C++ dialects used in practice (such as that supported by the
GNU C compiler and that of Microsoft's Visual C++) and implement appropriate analyzers, source code transformers, and regenerate source text. Combining advanced parsing algorithms such as GLR with symbol table construction and program transformation machinery can enable the construction of arbitrary C++ tools.

[edit] Problems and controversies

[edit] Standards compliance
Producing a reasonably standards-compliant C++ compiler has proven to be a difficult task for compiler vendors in general. For many years, different C++ compilers implemented the C++ language to different levels of compliance to the standard, and their implementations varied widely in some areas such as
partial template specialization. Recent releases of most popular C++ compilers support almost all of the C++ 1998 standard.[6]
One particular point of contention is the export keyword, intended to allow template definitions to be separated from their declarations. The first compiler to implement export was Comeau C/C++, in early 2003 (5 years after the release of the standard); in 2004, the beta compiler of Borland C++ Builder X was also released with export. Both of these compilers are based on the EDG C++ front end. It should also be noted that many C++ books provide example code using the keyword export (for example, Beginning ANSI C++ by Ivor Horton) which will not compile in most compilers, but there is no reference to the problem with the keyword export mentioned. Other compilers such as GCC do not support it at all. Herb Sutter, secretary of the C++ standards committee, recommended that export be removed from future versions of the C++ standard, [7] but finally the decision was made to retain it.[8]
In order to give compiler vendors greater freedom, the C++ standards committee decided not to dictate the implementation of name mangling, exception handling, and other implementation-specific features. The downside of this decision is that object code produced by different compilers is expected to be incompatible. There are, however, third party standards for particular machines or operating systems which attempt to standardize compilers on those platforms (for example C++ ABI[9]); some compilers adopt a secondary standard for these items.

[edit] Incompatibility with C
For more details on this topic, see
Compatibility of C and C++.
C++ is often considered to be a superset of C, but this is not strictly true.
[13] Most C code can easily be made to compile correctly in C++, but there are a few differences that cause some valid C code to be invalid in C++, or to behave differently in C++.
One commonly encountered difference is that C allows implicit conversion from void* to other pointer types, but C++ does not. So, the following is valid C code:
int *i = malloc(sizeof(int) * 5); /* Implicit conversion from void* to int* */
... but to make it work in both C and C++ one would need to use an explicit cast:
int *i = (int *) malloc(sizeof(int) * 5);
...and in C++-only code, the static cast is recommended:
int *i = static_cast(malloc(sizeof(int) * 5));
C++ also makes a small change in the behavior of the "conditional" operator. Consider:
// The interpretation of the following expression is different in C and C++
bool flag = xxxx;
int a;
flag ? a=2 : a=3;

// In C the precedence of ? and : are strictly higher than all assignment operators
// so the equation is grouped as follows, because ?: beats =
(flag ? (a=2) : a) = 3;

// The C++ grammar allows an assignment as the else-part, so it's interpreted as:
( flag ? (a=2) : (a=3) );
So in essence, the the "?" and ":" operators have different precedence levels in C++, the "?" placed higher than assignment operators, and the ":" is lower.
Another common portability issue is that C++ defines many new keywords, such as new and class, that may be used as identifiers (e.g. variable names) in a C program.
Some incompatibilities have been removed by the latest
(C99) C standard, which now supports C++ features such as // comments and mixed declarations and code. However, C99 introduced a number of new features that C++ does not support (such as variable-length arrays, native complex-number types, and compound literals), so the languages may be diverging more than they are converging. However, at least some of the new C99 features will likely be included in the next version of the C++ standard, C++0x.
In order to intermix C and C++ code, any C code which is to be called from/used in C++ must be declared with C linkage by placing it within an extern "C" { ... } block.

No comments:

TAKE A TOUR HERE ::-