Skip to content

row_filter

knime2py.nodes.row_filter

Row Filter module for KNIME to Python conversion.

Overview

This module generates Python code that filters rows of an input DataFrame based on predicates defined in a KNIME settings.xml file. The generated code constructs a boolean mask from the predicates, applies it to the DataFrame, and outputs the filtered result.

Runtime Behavior

Inputs: - Reads a DataFrame from the context using the key format 'src_id:in_port'.

Outputs: - Writes the filtered DataFrame to the context using the key format 'node_id:out_port'.

Key algorithms or mappings: - The module supports various comparison operators (e.g., equality, greater than) and handles column normalization and missing values.

Edge Cases

The code implements safeguards for missing columns, empty predicates, and NaN values, ensuring that the filtering logic remains robust under various input conditions.

Generated Code Dependencies

The generated code requires the following external libraries: - pandas These dependencies are required for the generated code, not for this module itself.

Usage

This module is typically invoked by the knime2py emitter as part of the conversion process from KNIME workflows to Python code. An example of expected context access:

df = context['src_id:in_port']  # input table

Node Identity

KNIME factory ID: - FACTORY = "org.knime.base.node.preproc.filter.row3.RowFilterNodeFactory"

Configuration

The settings are defined in the RowFilterSettings dataclass, which includes: - match_and: bool (default=True) - Determines if predicates are combined with AND or OR. - output_mode: str (default="MATCHING") - Specifies whether to output matching or non-matching rows. - predicates: List[Predicate] - Contains the filtering criteria.

The parse_row_filter_settings function extracts these values from the settings.xml file using XPath queries.

Limitations

This module does not support all KNIME filtering options and may approximate behavior in certain cases.

References

For more information, refer to the KNIME documentation and the following hub URL: https://hub.knime.com/knime/extensions/org.knime.features.base/latest/ org.knime.base.node.preproc.filter.row3.RowFilterNodeFactory

parse_row_filter_settings(node_dir)

Parse the row filter settings from the settings.xml file.

Parameters:

Name Type Description Default
node_dir Optional[Path]

The directory containing the settings.xml file.

required

Returns:

Name Type Description
RowFilterSettings RowFilterSettings

The parsed row filter settings.

Source code in src/knime2py/nodes/row_filter.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def parse_row_filter_settings(node_dir: Optional[Path]) -> RowFilterSettings:
    """
    Parse the row filter settings from the settings.xml file.

    Args:
        node_dir (Optional[Path]): The directory containing the settings.xml file.

    Returns:
        RowFilterSettings: The parsed row filter settings.
    """
    if not node_dir:
        return RowFilterSettings()
    settings_path = node_dir / "settings.xml"
    if not settings_path.exists():
        return RowFilterSettings()

    root = ET.parse(str(settings_path), parser=XML_PARSER).getroot()
    model_el = first_el(root, ".//*[local-name()='config' and @key='model']")

    match_and = True
    output_mode = "MATCHING"
    preds: List[Predicate] = []

    if model_el is not None:
        crit = (first(model_el, ".//*[local-name()='entry' and @key='matchCriteria']/@value") or "AND").strip().upper()
        match_and = (crit == "AND")
        output_mode = (first(model_el, ".//*[local-name()='entry' and @key='outputMode']/@value") or "MATCHING").strip().upper()

        # iterate predicate blocks under .../predicates/<config key='0'..>
        for p_cfg in model_el.xpath(
            ".//*[local-name()='config' and @key='predicates']/*[local-name()='config']"
        ):
            col = first(p_cfg, ".//*[local-name()='config' and @key='column']"
                               "/*[local-name()='entry' and @key='selected']/@value")
            op = (first(p_cfg, ".//*[local-name()='entry' and @key='operator']/@value") or "").strip().upper()
            vals = _collect_predicate_values(p_cfg)
            preds.append(Predicate(column=col or None, operator=op or None, values=vals))

    return RowFilterSettings(match_and=match_and, output_mode=output_mode, predicates=preds)

generate_imports()

Generate the necessary import statements for the row filter code.

Returns:

Type Description

List[str]: A list of import statements.

Source code in src/knime2py/nodes/row_filter.py
187
188
189
190
191
192
193
194
195
def generate_imports():
    """
    Generate the necessary import statements for the row filter code.

    Returns:
        List[str]: A list of import statements.
    """
    # Need pandas and 're' for column normalization in runtime helpers
    return ["import pandas as pd", "import re as _re"]

generate_py_body(node_id, node_dir, in_ports, out_ports=None)

Generate the body of the Python code for the row filter node.

Parameters:

Name Type Description Default
node_id str

The ID of the node.

required
node_dir Optional[str]

The directory of the node.

required
in_ports List[object]

The list of incoming ports.

required
out_ports Optional[List[str]]

The list of outgoing ports.

None

Returns:

Type Description
List[str]

List[str]: A list of lines of code that make up the body of the node.

Source code in src/knime2py/nodes/row_filter.py
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
def generate_py_body(
    node_id: str,
    node_dir: Optional[str],
    in_ports: List[tuple[str, str]],
    out_ports: Optional[List[str]] = None,
) -> List[str]:
    """
    Generate the body of the Python code for the row filter node.

    Args:
        node_id (str): The ID of the node.
        node_dir (Optional[str]): The directory of the node.
        in_ports (List[object]): The list of incoming ports.
        out_ports (Optional[List[str]]): The list of outgoing ports.

    Returns:
        List[str]: A list of lines of code that make up the body of the node.
    """
    ndir = Path(node_dir) if node_dir else None
    cfg = parse_row_filter_settings(ndir)

    lines: List[str] = []
    lines.append(f"# {HUB_URL}")

    # Single input table
    pairs = normalize_in_ports(in_ports)
    src_id, in_port = pairs[0]
    lines.append(f"df = context['{src_id}:{in_port}']  # input table")

    # Emit filter logic
    lines.extend(_emit_filter_code(cfg))

    # Publish result (default port 1)
    ports = out_ports or ["1"]
    for p in sorted({(p or '1') for p in ports}):
        lines.append(f"context['{node_id}:{p}'] = out_df")

    return lines

get_name()

Return name of the node in KNIME workflow.

Source code in src/knime2py/nodes/row_filter.py
437
438
439
def get_name() -> str:
    """Return name of the node in KNIME workflow."""
    return "Row Filter"

handle(ntype, nid, npath, incoming, outgoing)

Handle the processing of the row filter node.

Parameters:

Name Type Description Default
ntype

The type of the node.

required
nid

The ID of the node.

required
npath

The path to the node.

required
incoming

The incoming connections.

required
outgoing

The outgoing connections.

required

Returns:

Type Description

Tuple[List[str], List[str]]: A tuple containing the imports and body lines.

Source code in src/knime2py/nodes/row_filter.py
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
def handle(ntype, nid, npath, incoming, outgoing):
    """
    Handle the processing of the row filter node.

    Args:
        ntype: The type of the node.
        nid: The ID of the node.
        npath: The path to the node.
        incoming: The incoming connections.
        outgoing: The outgoing connections.

    Returns:
        Tuple[List[str], List[str]]: A tuple containing the imports and body lines.
    """
    explicit_imports = collect_module_imports(generate_imports)

    in_ports = [(src_id, str(getattr(e, "source_port", "") or "1")) for src_id, e in (incoming or [])]
    out_ports = [str(getattr(e, "source_port", "") or "1") for _, e in (outgoing or [])] or ["1"]

    node_lines = generate_py_body(nid, npath, in_ports, out_ports)
    found_imports, body = split_out_imports(node_lines)
    imports = sorted(set(explicit_imports) | set(found_imports))
    return imports, body