Based on the References, I modified the code collected by ChunMinChang, and added the blankLineRemover function for combining multiple blank lines into single empty line. This py code is useful for processing the codes generated by Simulink Embedded Coder, since the generated codes may contain annoying comments on version control, meanwhile I don’t want to disable the code comments for traceability. Also, you may find this code useful on somewhere else.

Check my github gist for any updates.

Python codes

import re
import sys
import os.path

# Method 1: As described by saltycrane (https://www.saltycrane.com/blog/2007/11/remove-c-comments-python/):
#           This regular expression was created by Jeffrey Friedl and later modified by Fred Curtis.
#           Check the above link for details.
def removeComments(text):
    """ remove c-style comments.
        text: blob of text with comments (can include newlines)
        returns: text with comments removed
    """
    pattern = r"""
                            ##  --------- COMMENT ---------
           //.*?$           ##  Start of // .... comment
         |                  ##
           /\*              ##  Start of /* ... */ comment
           [^*]*\*+         ##  Non-* followed by 1-or-more *'s
           (                ##
             [^/*][^*]*\*+  ##
           )*               ##  0-or-more things which don't start with /
                            ##    but do end with '*'
           /                ##  End of /* ... */ comment
         |                  ##  -OR-  various things which aren't comments:
           (                ##
                            ##  ------ " ... " STRING ------
             "              ##  Start of " ... " string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^"\\]       ##  Non "\ characters
             )*             ##
             "              ##  End of " ... " string
           |                ##  -OR-
                            ##
                            ##  ------ ' ... ' STRING ------
             '              ##  Start of ' ... ' string
             (              ##
               \\.          ##  Escaped char
             |              ##  -OR-
               [^'\\]       ##  Non '\ characters
             )*             ##
             '              ##  End of ' ... ' string
           |                ##  -OR-
                            ##
                            ##  ------ ANYTHING ELSE -------
             .              ##  Anything other char
             [^/"'\\]*      ##  Chars which doesn't start a comment, string
           )                ##    or escape
    """
    regex = re.compile(pattern, re.VERBOSE|re.MULTILINE|re.DOTALL)
    noncomments = [m.group(2) for m in regex.finditer(text) if m.group(2)]

    return "".join(noncomments)

# Method 2: Answered by Markus Jarderot on stackoverflow 
#           Link: https://stackoverflow.com/questions/241327/remove-c-and-c-comments-using-python
def commentRemover(text):
    def replacer(match):
        s = match.group(0)
        if s.startswith('/'):
            return " " # note: a space and not an empty string
        else:
            return s
    pattern = re.compile(
        r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
        re.DOTALL | re.MULTILINE
    )
    return re.sub(pattern, replacer, text)

# Combine multiple blank lines into one empty line
# Added by Technblogy.com
def blankLineRemover(text):
    def replacer(match):
        s = match.group(0)
        return ""
    pattern = re.compile(
        r'^\s*$',
        re.DOTALL | re.MULTILINE
    )
    return re.sub(pattern, replacer, text)

filenames = ['test.h']
for filename in filenames:
    with open(filename) as f:
        # use removeComments or commentRemover, two methods
        # uncmtFile = removeComments(f.read())
        uncmtFile = commentRemover(f.read())
        # blankLineRemover combines multiple blank lines into one empty line
        uncmtFile = blankLineRemover(uncmtFile)
        with open(os.path.splitext(filename)[0]+'.nocmt'+os.path.splitext(filename)[1], "w") as new_file:
            new_file.write(uncmtFile)

Test

Input file:

/* This is a C-style comment. */
This is not a comment.
/* This is another
 * C-style comment.
 */
"This is /* also not a comment */"
// This is also a comment
This is still // a comment

     //blank lines

This is still /* a comment */
//blank comments
//blank comments
This is still /* a comment */ again
This is the final line

Output file:


This is not a comment.

"This is /* also not a comment */"

This is still  

This is still  

This is still   again
This is the final line

References

Eliot: How to remove C style comments using Python

Markus Jarderot Answered: Remove C and C++ comments using Python?

ChunMinChang: remove_c_style_comments.py

Technblogy.com modified: remove_c_style_comments.py


0 Comments

Leave a Reply

Your email address will not be published.