Getting Started: Parser¶
The parser module is the engine of the framework. It transforms raw
patch data (strings) into a structured hierarchy of objects like
DiffCodeFile and Hunk.
The Line Factory¶
The parser.PatchParser uses a central factory method to categorize
every line of a patch. This ensures that the correct specialized class is
instantiated based on the line’s prefix.
>>> from fitzzftw.patch.parser import PatchParser
>>> from fitzzftw.patch.lines import HeadLine, HunkHeadLine, HunkLine
>>> parser = PatchParser()
Testing the factory with different line types
>>> type(parser.create_line("--- old_file.py")) == HeadLine
True
>>> type(parser.create_line("@@ -1,1 +1,1 @@")) == HunkHeadLine
True
>>> type(parser.create_line("+ added line")) == HunkLine
True
The Line Stream¶
The method get_lines() is a generator that converts
strings into specialized line objects.
>>> from fitzzftw.patch.parser import PatchParser
>>> parser = PatchParser()
>>> stream = ["--- a/file.txt", "+++ b/file.txt", "+new content"]
>>> lines = list(parser.get_lines(stream))
>>> lines[0]
HeadLine(Content: 'a/file.txt', Prefix: '--- ')
>>> lines[2]
HunkLine(Content: 'new content', Prefix: '+')
Parsing a Patch Stream¶
The parser works as a generator. It processes an iterable of strings (like a
file object or a list of lines) and yields DiffCodeFile
objects. This “streaming” approach is memory efficient for large patches.
Here is a minimal example of parsing a raw diff string:
>>> diff_data = [
... "--- a/test.txt",
... "+++ b/test.txt",
... "@@ -1,1 +1,1 @@",
... "-old content",
... "+new content"
... ]
>>> files = list(parser.iter_files(diff_data))
>>> len(files)
1
>>> patch_file = files[0]
>>> patch_file.orig_header
HeadLine(Content: 'a/test.txt', Prefix: '--- ')
>>> len(patch_file.hunks)
1
Handling Git Diff Noise¶
Real-world diffs (like those from git diff) often contain irrelevant
metadata (e.g., index, mode, or extended headers). The parser is designed
to be tolerant: it safely ignores unknown lines and only processes
structurally relevant data.
Git metadata or random text is ignored
>>> noise = [
... "diff --git a/test.txt b/test.txt",
... "index 0000000..1234567",
... "--- a/test.txt",
... "+++ b/test.txt",
... "@@ -1,1 +1,1 @@",
... "-old content",
... "+new content"
... ]
>>> files = list(parser.iter_files(noise))
>>> len(files)
1
>>> files[0].orig_header
HeadLine(Content: 'a/test.txt', Prefix: '--- ')
Error Handling¶
While the parser ignores noise, it still validates the sequence of the
patch. If data appears in an impossible order (e.g., a hunk starts without
a preceding file header), it raises a PatchParseError.
A hunk header without leading ‘—’ / ‘+++’ headers is invalid
>>> invalid_structure = ["@@ -1,1 +1,1 @@", "+added"]
>>> list(parser.iter_files(invalid_structure))
Traceback (most recent call last):
...
fitzzftw.patch.exceptions.PatchParseError: Line 1: Found '@@ ' before file headers