Skip to content

column_filter

`knime2py.nodes.column_filter`

Column Filter module for KNIME to Python conversion.

Overview

This module generates Python code that filters columns from a DataFrame based on exclusion criteria specified in a KNIME node's settings.xml file. It fits into the knime2py generator pipeline by transforming the configuration into executable Python code.

Runtime Behavior

Inputs: - Reads a DataFrame from the context using the key format 'src_id:in_port'.

Outputs: - Writes the filtered DataFrame to the context using the key format 'node_id:out_port', where out_port defaults to '1'.

Key algorithms: - The module implements a column exclusion mechanism, dropping specified columns while preserving the order of the remaining columns.

Edge Cases

The code handles cases where: - No columns are specified for exclusion, resulting in a passthrough. - Missing columns in the DataFrame do not raise errors due to the use of errors='ignore' in the drop operation.

Generated Code Dependencies

The generated code requires the following external libraries: - pandas

These dependencies are required by the generated code, not by this module.

Usage

This module is typically invoked by the knime2py emitter when processing a Column Filter node. An example of expected context access is:

df = context['source_id:1']  # Accessing the input DataFrame

Node Identity

KNIME factory id: - FACTORY = "org.knime.base.node.preproc.filter.column.DataColumnSpecFilterNodeFactory"

Configuration

The settings are encapsulated in the ColumnFilterSettings dataclass, which contains: - excludes: List of column names to exclude (default is an empty list).

The parse_column_filter_settings function extracts these values by scanning the settings.xml for configuration blocks containing 'exclude'.

Limitations

This module does not support inclusion lists; it only processes exclusion criteria.

References

For more information, refer to the KNIME documentation and the following URL: https://hub.knime.com/knime/extensions/org.knime.features.base/latest/ org.knime.base.node.preproc.filter.column2.ColumnFilter2NodeFactory

`first(root, xpath)`

Return the first string value for xpath, stripped, or None.

If the xpath returns an element, prefer its @value, else its .text. If it returns a scalar (string/number), cast to str and strip.

Source code in src/knime2py/nodes/node_utils.py

def first(root: ET._Element, xpath: str) -> Optional[str]:
    """Return the first string value for xpath, stripped, or None.

    If the xpath returns an element, prefer its @value, else its .text.
    If it returns a scalar (string/number), cast to str and strip.
    """
    vals = root.xpath(xpath)
    if not vals:
        return None
    v = vals[0]
    # Element -> prefer @value then .text
    if isinstance(v, ET._Element):
        if v.get("value") is not None:
            return (v.get("value") or "").strip()
        return (v.text or "").strip()
    # Scalar / attribute string / number
    return (str(v) if v is not None else "").strip()

`first_el(root, xpath)`

Return the first Element for xpath, or None (ignores non-Elements).

Source code in src/knime2py/nodes/node_utils.py

def first_el(root: ET._Element, xpath: str) -> Optional[ET._Element]:
    """Return the first Element for xpath, or None (ignores non-Elements)."""
    vals = root.xpath(xpath)
    for v in vals:
        if isinstance(v, ET._Element):
            return v
    return None

`all_values(root, xpath)`

Return all values for xpath as stripped strings.

Source code in src/knime2py/nodes/node_utils.py

def all_values(root: ET._Element, xpath: str) -> List[str]:
    """Return all values for xpath as stripped strings."""
    return [(v or "").strip() for v in root.xpath(xpath)]

`iter_entries(root)`

Yield (key, value) pairs for all KNIME nodes.

Source code in src/knime2py/nodes/node_utils.py

def iter_entries(root: ET._Element):
    """Yield (key, value) pairs for all KNIME <entry key="..." value="..."/> nodes."""
    for ent in root.xpath(_ENTRY_XPATH):
        k = (ent.get("key") or "").strip()
        v = ent.get("value")
        yield k, (v or "").strip() if v is not None else None

`normalize_delim(raw)`

Normalize delimiter strings to their corresponding character representation.

Source code in src/knime2py/nodes/node_utils.py

def normalize_delim(raw: Optional[str]) -> Optional[str]:
    """Normalize delimiter strings to their corresponding character representation."""
    if raw is None:
        return None
    v = raw.strip()
    if len(v) == 1:
        return v
    up = v.upper()
    if up in {"TAB", "\\T", "CTRL-I"}:
        return "\t"
    if up in {"COMMA"}:
        return ","
    if up in {"SEMICOLON", "SEMI", "SC"}:
        return ";"
    if up in {"SPACE"}:
        return " "
    if up in {"PIPE"}:
        return "|"
    if v == "\\t":
        return "\t"
    return v or None

`normalize_char(raw)`

Normalize character strings to their corresponding single character representation.

Source code in src/knime2py/nodes/node_utils.py

def normalize_char(raw: Optional[str]) -> Optional[str]:
    """Normalize character strings to their corresponding single character representation."""
    if not raw:
        return None
    v = raw.strip()
    if v.upper() in {"", "NONE", "NULL"}:
        return None
    if v == "&quot;":
        return '"'
    if v == "&apos;":
        return "'"
    return v[:1] if len(v) >= 1 else None

`looks_like_path(s)`

Check if the given string looks like a file path.

Source code in src/knime2py/nodes/node_utils.py

def looks_like_path(s: str) -> bool:
    """Check if the given string looks like a file path."""
    if not s:
        return False
    low = s.lower()
    if low.startswith(("file:", "s3:", "hdfs:", "abfss:", "http://", "https://")):
        return True
    if s.endswith(".csv"):
        return True
    if "/" in s or "\\" in s:
        return True
    return False

`bool_from_value(v)`

Convert a string value to a boolean.

Source code in src/knime2py/nodes/node_utils.py

def bool_from_value(v: Optional[str]) -> Optional[bool]:
    """Convert a string value to a boolean."""
    if v is None:
        return None
    t = v.strip().lower()
    if t in {"true", "1", "yes", "y"}:
        return True
    if t in {"false", "0", "no", "n"}:
        return False
    return None

`normalize_in_ports(in_ports)`

Accepts items like ('1393','1') or '1393:1' (or just '1393') and returns a normalized list of (src_id, port) as strings.

Source code in src/knime2py/nodes/node_utils.py

def normalize_in_ports(in_ports: List[tuple[str, str]]) -> List[Tuple[str, str]]:
    """
    Accepts items like ('1393','1') or '1393:1' (or just '1393') and
    returns a normalized list of (src_id, port) as strings.
    """
    norm: List[Tuple[str, str]] = []
    for item in in_ports or []:
        if isinstance(item, tuple) and len(item) == 2:
            src, port = str(item[0]), str(item[1] or "1")
            norm.append((src, port))
        else:
            s = str(item)
            if ":" in s:
                src, port = s.split(":", 1)
                norm.append((src, port or "1"))
            elif s:
                norm.append((s, "1"))
    if not norm:
        norm.append(("UNKNOWN", "1"))
    return norm

`context_assignment_lines(node_id, out_ports)`

For reader-like nodes that produce a dataframe named df, publish it under context keys ':'.

Source code in src/knime2py/nodes/node_utils.py

def context_assignment_lines(node_id: str, out_ports: List[str]) -> List[str]:
    """
    For reader-like nodes that produce a dataframe named `df`,
    publish it under context keys '<node_id>:<port>'.
    """
    ports = sorted({(p or "1") for p in (out_ports or [])}) or ["1"]
    return [f"context['{node_id}:{p}'] = df" for p in ports]

`extract_csv_path(root)`

Prefer keys that sound like file paths; fall back to any entry value that looks like a path. Avoid false-positives like node_file='settings.xml' via looks_like_path().

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_path(root: ET._Element) -> Optional[str]:
    """
    Prefer keys that *sound* like file paths; fall back to any entry value that looks like a path.
    Avoid false-positives like node_file='settings.xml' via looks_like_path().
    """
    # Prefer specific-ish keys first
    for pat in (r"\bpath\b", r"\burl\b", r"\bfile\b", r"location"):
        v = _first_value_re(root, pat)
        if v and looks_like_path(v):
            return v
    # Fallback: any entry value that looks like a CSV/path
    for _k, v in iter_entries(root):
        if v and looks_like_path(v):
            return v
    return None

`extract_csv_sep(root)`

Extract the CSV separator from the XML configuration.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_sep(root: ET._Element) -> Optional[str]:
    """Extract the CSV separator from the XML configuration."""
    raw = _first_value_re(root, r"(delim|separator|column[_-]?delimiter)\b")
    return normalize_delim(raw)

`extract_csv_quotechar(root)`

Extract the quote character used in the CSV configuration.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_quotechar(root: ET._Element) -> Optional[str]:
    """Extract the quote character used in the CSV configuration."""
    raw = _first_value_re_excluding(root, r"\bquote(_?char)?\b", r"escape")
    if raw is None:
        # looser fallback: any 'quote' key that isn't an escape
        for k, v in iter_entries(root):
            if "quote" in k.lower() and "escape" not in k.lower():
                raw = v
                break
    return normalize_char(raw)

`extract_csv_escapechar(root)`

Extract the escape character used in the CSV configuration.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_escapechar(root: ET._Element) -> Optional[str]:
    """Extract the escape character used in the CSV configuration."""
    raw = _first_value_re(root, r"escape")
    return normalize_char(raw)

`extract_csv_encoding(root)`

Extract the character encoding from the CSV configuration.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_encoding(root: ET._Element) -> Optional[str]:
    """Extract the character encoding from the CSV configuration."""
    return (
        _first_value_re(root, r"\bcharacter_set\b")
        or _first_value_re(root, r"\bcharset\b")
        or _first_value_re(root, r"encoding")
    )

`extract_csv_header_reader(root)`

Reader: look for 'column header', 'hasheader', or plain 'header', but avoid writer-only keys like 'write_header'.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_header_reader(root: ET._Element) -> Optional[bool]:
    """
    Reader: look for 'column header', 'hasheader', or plain 'header', but avoid writer-only
    keys like 'write_header'.
    """
    for k, v in iter_entries(root):
        lk = k.lower()
        if "header" not in lk:
            continue
        if "write" in lk:
            continue
        if "column" in lk or "hasheader" in lk or lk == "header":
            return bool_from_value(v)
    return None

`extract_csv_header_writer(root)`

Writer: prefer explicit 'writeColumnHeader'/'write_header'; otherwise any key whose name contains both 'write' and 'header'.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_header_writer(root: ET._Element) -> Optional[bool]:
    """
    Writer: prefer explicit 'writeColumnHeader'/'write_header'; otherwise any key whose
    name contains both 'write' and 'header'.
    """
    v = (
        _first_value_re(root, r"\bwriteColumnHeader\b")
        or _first_value_re(root, r"\bwrite_header\b")
        or _first_value_all_tokens(root, ["write", "header"])
    )
    return bool_from_value(v) if v is not None else None

`extract_csv_na_rep(root)`

Writer NA representation

modern: key='missing_value_pattern' (may be empty string '')
older: key contains both 'missing' and 'representation'

Keep empty string '' as a real value; return None only if not set.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_na_rep(root: ET._Element) -> Optional[str]:
    """
    Writer NA representation:
      - modern: key='missing_value_pattern' (may be empty string '')
      - older: key contains both 'missing' and 'representation'
    Keep empty string '' as a real value; return None only if not set.
    """
    v = _first_value_re(root, r"^missing_value_pattern$")
    if v is None:
        v = _first_value_all_tokens(root, ["missing", "representation"])
    return v

`extract_csv_include_index(root)`

Extract whether to include the index in the CSV output.

Source code in src/knime2py/nodes/node_utils.py

def extract_csv_include_index(root: ET._Element) -> Optional[bool]:
    """Extract whether to include the index in the CSV output."""
    raw = _first_value_re(root, r"include[_-]?index")
    return bool_from_value(raw)

`extract_table_spec_types(root)`

Return {column_name: java_class} from table_spec_config_Internals. Looks under .../individual_specs/*/ blocks.

Source code in src/knime2py/nodes/node_utils.py

def extract_table_spec_types(root: ET._Element) -> dict:
    """
    Return {column_name: java_class} from table_spec_config_Internals.
    Looks under .../individual_specs/*/<config key='0'..> blocks.
    """
    out = {}
    for cfg in root.xpath(
        ".//*[local-name()='config' and @key='table_spec_config_Internals']"
        "/*[local-name()='config' and @key='individual_specs']"
        "/*[local-name()='config']"  # per file block
        "/*[local-name()='config' and re:test(@key, '^[0-9]+$')]",
        namespaces={'re': "http://exslt.org/regular-expressions"}
    ):
        name = first(cfg, ".//*[local-name()='entry' and @key='name']/@value")
        jcls = first(cfg, ".//*[local-name()='config' and @key='type']"
                          "/*[local-name()='entry' and @key='class']/@value")
        if name:
            out[name] = jcls or ""
    return out

`java_to_pandas_dtype(java_class)`

Map KNIME java types to pandas nullable dtypes.

Source code in src/knime2py/nodes/node_utils.py

def java_to_pandas_dtype(java_class: str) -> Optional[str]:
    """
    Map KNIME java types to pandas nullable dtypes.
    """
    j = (java_class or "").lower()
    if "integer" in j or "long" in j or "intcell" in j:
        return "Int64"
    if "double" in j or "float" in j:
        return "Float64"
    if "boolean" in j:
        return "boolean"
    if "string" in j:
        return "string"
    # leave unknowns to inference
    return None

`collect_module_imports(mod_or_func)`

Return a sorted list of unique import lines from either

a module object that defines generate_imports()
a callable (e.g. the generate_imports function itself)

Source code in src/knime2py/nodes/node_utils.py

def collect_module_imports(mod_or_func: Optional[Union[object, Callable[[], Iterable[str]]]]) -> List[str]:
    """
    Return a sorted list of unique import lines from either:
      - a module object that defines generate_imports()
      - a callable (e.g. the generate_imports function itself)
    """
    imports = set()
    try:
        if mod_or_func is None:
            return []
        # If they passed the function directly
        if callable(mod_or_func):
            result = mod_or_func()
            items = _coerce_iterable(result)
        else:
            gi = getattr(mod_or_func, "generate_imports", None)
            if callable(gi):
                result = gi()
                items = _coerce_iterable(result)
            else:
                items = []
        for line in items:
            s = (line or "").strip()
            if s:
                imports.add(s)
    except Exception:
        # don’t let import gathering crash codegen
        return []
    return sorted(imports)

`split_out_imports(lines)`

Return (found_imports, body_without_imports). Any line that begins with 'import ' or 'from ' (ignoring leading spaces) is treated as an import.

Source code in src/knime2py/nodes/node_utils.py

def split_out_imports(lines: List[str]) -> tuple[List[str], List[str]]:
    """
    Return (found_imports, body_without_imports).
    Any line that begins with 'import ' or 'from ' (ignoring leading spaces) is treated as an import.
    """
    found: List[str] = []
    body: List[str] = []
    for ln in lines or []:
        s = ln.lstrip()
        if s.startswith("import ") or s.startswith("from "):
            found.append(s.strip())
        else:
            body.append(ln)
    return found, body

`resolve_reader_path(root, node_dir)`

Resolve the path from settings.xml. Supports: - LOCAL: absolute path is used as-is - RELATIVE + knime.workflow: path is relative to the workflow directory

Source code in src/knime2py/nodes/node_utils.py

def resolve_reader_path(root: ET._Element, node_dir: Path) -> Optional[str]:
    """
    Resolve the path from settings.xml. Supports:
      - LOCAL: absolute path is used as-is
      - RELATIVE + knime.workflow: path is relative to the workflow directory
    """
    path_cfg = first_el(root, ".//*[local-name()='config' and @key='path']")
    if path_cfg is None:
        return None

    raw_path = first(path_cfg, ".//*[local-name()='entry' and @key='path']/@value")
    fs_type  = first(path_cfg, ".//*[local-name()='entry' and @key='file_system_type']/@value")
    spec     = first(path_cfg, ".//*[local-name()='entry' and @key='file_system_specifier']/@value")

    if not raw_path:
        return None

    node_has_settings = (node_dir / "settings.xml").exists()
    workflow_dir = node_dir.parent if node_has_settings else node_dir

    try:
        p = Path(raw_path)
        if (fs_type or "").upper() == "LOCAL" or p.is_absolute():
            return str(p.expanduser().resolve())

        if (fs_type or "").upper() == "RELATIVE" and (spec or "").lower() == "knime.workflow":
            return str((workflow_dir / raw_path).expanduser().resolve())

        # Fallback: treat as relative to workflow_dir
        return str((workflow_dir / raw_path).expanduser().resolve())
    except Exception:
        # Last-ditch: just return the raw string
        return raw_path

`generate_imports()`

Generate the necessary import statements for the generated Python code.

Returns:

Type	Description
`List[str]`	List[str]: A list of import statements.

Source code in src/knime2py/nodes/column_filter.py

def generate_imports() -> List[str]:
    """
    Generate the necessary import statements for the generated Python code.

    Returns:
        List[str]: A list of import statements.
    """
    return ["import pandas as pd"]

`generate_py_body(node_id, node_dir, in_ports, out_ports=None)`

Generate the body of the Python code for the node.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The ID of the node.	required
`node_dir`	`Optional[str]`	The directory of the node.	required
`in_ports`	`List[object]`	The input ports for the node.	required
`out_ports`	`Optional[List[str]]`	The output ports for the node.	`None`

Returns:

Type	Description
`List[str]`	List[str]: A list of Python code lines for the node's functionality.

Source code in src/knime2py/nodes/column_filter.py

def generate_py_body(
    node_id: str,
    node_dir: Optional[str],
    in_ports: List[tuple[str, str]],
    out_ports: Optional[List[str]] = None,
) -> List[str]:
    """
    Generate the body of the Python code for the node.

    Args:
        node_id (str): The ID of the node.
        node_dir (Optional[str]): The directory of the node.
        in_ports (List[object]): The input ports for the node.
        out_ports (Optional[List[str]]): The output ports for the node.

    Returns:
        List[str]: A list of Python code lines for the node's functionality.
    """
    ndir = Path(node_dir) if node_dir else None
    settings = parse_column_filter_settings(ndir)

    lines: List[str] = []
    lines.append(f"# {HUB_URL}")

    pairs = normalize_in_ports(in_ports)
    src_id, in_port = pairs[0]
    lines.append(f"df = context['{src_id}:{in_port}']  # input table")

    lines.extend(_emit_filter_code(settings))

    ports = out_ports or ["1"]
    for p in sorted({(p or '1') for p in ports}):
        lines.append(f"context['{node_id}:{p}'] = out_df")

    return lines

`get_name()`

Return name of the node in KNIME workflow.

Source code in src/knime2py/nodes/column_filter.py

def get_name() -> str:
    """Return name of the node in KNIME workflow."""
    return "Column Filter"

`handle(ntype, nid, npath, incoming, outgoing)`

Handle the processing of a node, returning the necessary imports and body lines.

Parameters:

Name	Type	Description	Default
`ntype`		The type of the node.	required
`nid`		The ID of the node.	required
`npath`		The path to the node.	required
`incoming`		The incoming connections to the node.	required
`outgoing`		The outgoing connections from the node.	required

Returns:

Type	Description
	Tuple[List[str], List[str]]: A tuple containing the list of imports and the body lines.

Source code in src/knime2py/nodes/column_filter.py

def handle(ntype, nid, npath, incoming, outgoing):
    """
    Handle the processing of a node, returning the necessary imports and body lines.

    Args:
        ntype: The type of the node.
        nid: The ID of the node.
        npath: The path to the node.
        incoming: The incoming connections to the node.
        outgoing: The outgoing connections from the node.

    Returns:
        Tuple[List[str], List[str]]: A tuple containing the list of imports and the body lines.
    """
    explicit_imports = collect_module_imports(generate_imports)

    in_ports = [(src_id, str(getattr(e, "source_port", "") or "1")) for src_id, e in (incoming or [])]
    out_ports = [str(getattr(e, "source_port", "") or "1") for _, e in (outgoing or [])] or ["1"]

    node_lines = generate_py_body(nid, npath, in_ports, out_ports)
    found_imports, body = split_out_imports(node_lines)
    imports = sorted(set(explicit_imports) | set(found_imports))
    return imports, body