Getting Started: Parser
=======================

The :mod:`.parser` module is the engine of the framework. It transforms raw 
patch data (strings) into a structured hierarchy of objects like 
:class:`~.container.DiffCodeFile` and :class:`~.container.Hunk`.

The Line Factory
----------------

The :class:`.parser.PatchParser` uses a central factory method to categorize 
every line of a patch. This ensures that the correct specialized class is 
instantiated based on the line's prefix.

    >>> from fitzzftw.patch.parser import PatchParser
    >>> from fitzzftw.patch.lines import HeadLine, HunkHeadLine, HunkLine

    >>> parser = PatchParser()
    
Testing the factory with different line types

    >>> type(parser.create_line("--- old_file.py")) == HeadLine
    True
    >>> type(parser.create_line("@@ -1,1 +1,1 @@")) == HunkHeadLine
    True
    >>> type(parser.create_line("+ added line")) == HunkLine
    True

The Line Stream
---------------

The method :meth:`~.PatchParser.get_lines` is a generator that converts 
strings into specialized line objects.

    >>> from fitzzftw.patch.parser import PatchParser
    >>> parser = PatchParser()
    >>> stream = ["--- a/file.txt", "+++ b/file.txt", "+new content"]
    >>> lines = list(parser.get_lines(stream))
    >>> lines[0]
    HeadLine(Content: 'a/file.txt', Prefix: '--- ')
    >>> lines[2]
    HunkLine(Content: 'new content', Prefix: '+')


Parsing a Patch Stream
----------------------

The parser works as a generator. It processes an iterable of strings (like a 
file object or a list of lines) and yields :class:`~.container.DiffCodeFile` 
objects. This "streaming" approach is memory efficient for large patches.

Here is a minimal example of parsing a raw diff string:

    >>> diff_data = [
    ...     "--- a/test.txt",
    ...     "+++ b/test.txt",
    ...     "@@ -1,1 +1,1 @@",
    ...     "-old content",
    ...     "+new content"
    ... ]
    
    >>> files = list(parser.iter_files(diff_data))
    >>> len(files)
    1

    >>> patch_file = files[0]
    >>> patch_file.orig_header
    HeadLine(Content: 'a/test.txt', Prefix: '--- ')

    >>> len(patch_file.hunks)
    1


Handling Git Diff Noise
-----------------------

Real-world diffs (like those from ``git diff``) often contain irrelevant 
metadata (e.g., index, mode, or extended headers). The parser is designed 
to be **tolerant**: it safely ignores unknown lines and only processes 
structurally relevant data.

Git metadata or random text is ignored

    >>> noise = [
    ...     "diff --git a/test.txt b/test.txt",
    ...     "index 0000000..1234567",
    ...     "--- a/test.txt",
    ...     "+++ b/test.txt",
    ...     "@@ -1,1 +1,1 @@",
    ...     "-old content",
    ...     "+new content"
    ... ]
    
    >>> files = list(parser.iter_files(noise))
    >>> len(files)
    1
    >>> files[0].orig_header
    HeadLine(Content: 'a/test.txt', Prefix: '--- ')


Error Handling
--------------

While the parser ignores noise, it still validates the **sequence** of the 
patch. If data appears in an impossible order (e.g., a hunk starts without 
a preceding file header), it raises a :class:`~.exceptions.PatchParseError`.

A hunk header without leading '---' / '+++' headers is invalid

    >>> invalid_structure = ["@@ -1,1 +1,1 @@", "+added"]
    >>> list(parser.iter_files(invalid_structure))
    Traceback (most recent call last):
        ...
    fitzzftw.patch.exceptions.PatchParseError: Line 1: Found '@@ ' before file headers