Skip to content

Rodeo Mapping Files Guide

Mapping files define how Rodeo transforms source data into graph nodes and relationships. This guide provides a complete reference for creating mapping files.

Overview

A mapping file is a YAML document that tells Rodeo:

  • Which fields to extract from your source data
  • How to create nodes with labels and attributes
  • How to create relationships between nodes
  • When to create or merge entities based on conditions
  • Which indexes to create for query performance

Basic Structure

A mapping file has three required top-level fields and two optional fields:

name: my-mapping           # Required: unique name for this mapping

fields:                    # Required: field definitions from source data
  - id: fieldId
    key: SourceFieldName
    type: String

nodes:                     # Required: node definitions
  - id: nodeId
    labels:
      - MyLabel
    attributes:
      - id: fieldId

relationships:             # Optional: relationship definitions
  - type: RELATES_TO
    srcNode: nodeId1
    dstNode: nodeId2

indexes:                   # Optional: index definitions
  - name: my_index
    label: MyLabel
    attributes:
      - fieldId

Fields

The fields section defines which data to extract from your source file and how to handle it. Each field maps a source column to an internal identifier used throughout the rest of the mapping.

Basic Field Definition

fields:
  - id: userId          # Internal identifier used in nodes/relationships
    key: user_id        # Column name in source CSV or field name in source JSON
    type: String        # Data type in source

Supported Data Types

Type Description
String Text data
Number Numeric data (integers or decimals)
Datetime Date and time values
List Array of values (support in progress)
Dictionary Nested key-value pairs (support in progress)

Type Conversion

When source data is stored as one type but should be treated as another, use convertType:

fields:
  # String field containing a date
  - id: createdAt
    key: created_at
    type: String
    convertType: Datetime

  # String field containing a number
  - id: amount
    key: total_amount
    type: String
    convertType: Number

  # Numeric timestamp to datetime
  - id: timestamp
    key: unix_timestamp
    type: Number
    convertType: Datetime

Nodes

The nodes section defines how to create graph nodes from your source data.

Basic Node Definition

nodes:
  - id: userNode              # Internal identifier for this node definition
    labels:                   # One or more Neo4j labels
      - User
      - Person
    attributes:               # Fields to include as node properties
      - id: userId
      - id: email
      - id: createdAt

Attribute Aliasing

Rename a field when storing it as a node attribute using newId:

nodes:
  - id: userNode
    labels:
      - User
    attributes:
      - id: userId
      - id: email
        newId: emailAddress   # Stored as "emailAddress" on the node
      - id: createdAt
        newId: memberSince

Conditions

Control when nodes are created using conditions. Conditions support two types of checks: comparisons against values and presence checks.

Comparison Conditions

Compare a field's value against a literal value:

nodes:
  - id: premiumUser
    conditions:
      comparisons:
        - field: accountType
          op: eq
          value: "premium"
    labels:
      - User
      - PremiumUser
    attributes:
      - id: userId

Supported comparison operators:

Operator Description
eq Equal to
neq Not equal to
gt Greater than
gte Greater than or equal to
lt Less than
lte Less than or equal to

Presence Conditions

Check if a field exists and is non-null:

nodes:
  - id: verifiedUser
    conditions:
      contains:
        - verificationDate    # Only create node if this field is present
    labels:
      - User
      - Verified
    attributes:
      - id: userId
      - id: verificationDate

Combined Conditions

Use both comparison and presence conditions together. All conditions must be met:

nodes:
  - id: activeAdmin
    conditions:
      comparisons:
        - field: role
          op: eq
          value: "admin"
        - field: loginCount
          op: gte
          value: 1
      contains:
        - lastLoginDate
    labels:
      - User
      - Admin
    attributes:
      - id: userId
      - id: role
      - id: lastLoginDate

Merge Behavior

By default, Rodeo creates new nodes. To update existing nodes instead of creating duplicates, define merge criteria:

nodes:
  - id: userNode
    labels:
      - User
    attributes:
      - id: userId
      - id: email
      - id: name
      - id: lastLogin
    merge:
      labels:
        - User
      attributes:
        - userId          # Match on userId to find existing nodes

When a merge is defined, Rodeo will:

  1. Search for existing nodes matching the specified labels and attributes
  2. Update the existing node if found
  3. Create a new node if no match exists

You can merge on multiple attributes for composite uniqueness:

    merge:
      labels:
        - Transaction
      attributes:
        - accountId
        - transactionDate
        - amount

Iterators

When a field contains a list, use an iterator to create a node for each item in the list:

fields:
  - id: tagList
    key: tags
    type: List

nodes:
  - id: tagNode
    conditions:
      contains:
        - tagList
    labels:
      - Tag
    iterator:
      id: tagList              # The list field to iterate over
      newId: tagName           # Attribute name for each item
      type: String             # Type of each item in the list
    attributes:
      - id: tagName            # Use the iterator's newId here
      - id: documentId

This creates one Tag node for each item in the tagList array.

Relationships

The relationships section defines connections between nodes.

Basic Relationship Definition

relationships:
  - type: WORKS_FOR           # Relationship type in Neo4j
    srcNode: employeeNode     # Source node id (from nodes section)
    dstNode: companyNode      # Destination node id (from nodes section)

Relationship Attributes

Add properties to relationships:

relationships:
  - type: PURCHASED
    srcNode: customerNode
    dstNode: productNode
    attributes:
      - id: purchaseDate
      - id: quantity
      - id: price
        newId: purchasePrice

Required Relationships

Mark a relationship as required when both source and destination nodes must exist:

relationships:
  - type: BELONGS_TO
    required: True
    srcNode: itemNode
    dstNode: categoryNode

When required: True and either the source or destination node doesn't exist (due to failed conditions), the record is skipped. Processing continues with remaining records.

Relationship Conditions

Apply conditions to control when relationships are created:

relationships:
  - type: MANAGES
    srcNode: managerNode
    dstNode: employeeNode
    conditions:
      comparisons:
        - field: managerLevel
          op: gte
          value: 2
      contains:
        - departmentId
    attributes:
      - id: departmentId

Relationship Merge

Merge relationships to avoid duplicates:

relationships:
  - type: FOLLOWS
    srcNode: userNode
    dstNode: followedUserNode
    attributes:
      - id: followDate
      - id: userId
        newId: followerId
    merge:
      attributes:
        - followerId

Indexes

The indexes section defines indexes to create before processing. Indexes improve query performance and are especially important for merge operations.

Node Indexes

Create an index on node attributes:

indexes:
  - name: user_email_index
    label: User               # Node label
    attributes:
      - email

Multi-Attribute Indexes

Create composite indexes on multiple attributes:

indexes:
  - name: transaction_lookup
    label: Transaction
    attributes:
      - accountId
      - transactionDate

Relationship Indexes

Create an index on relationship attributes:

indexes:
  - name: purchase_index
    type: PURCHASED           # Relationship type (use 'type' instead of 'label')
    attributes:
      - purchaseDate
      - purchasePrice

Complete Example

Here's a complete mapping file demonstrating multiple features:

name: ecommerce-orders

fields:
  - id: orderId
    key: order_id
    type: String
  - id: customerId
    key: customer_id
    type: String
  - id: customerEmail
    key: email
    type: String
  - id: orderDate
    key: order_date
    type: String
    convertType: Datetime
  - id: totalAmount
    key: total
    type: String
    convertType: Number
  - id: status
    key: order_status
    type: String
  - id: productIds
    key: product_ids
    type: List

nodes:
  - id: customerNode
    labels:
      - Customer
    attributes:
      - id: customerId
      - id: customerEmail
        newId: email
    merge:
      labels:
        - Customer
      attributes:
        - customerId

  - id: orderNode
    conditions:
      comparisons:
        - field: totalAmount
          op: gt
          value: 0
      contains:
        - orderDate
    labels:
      - Order
    attributes:
      - id: orderId
      - id: orderDate
      - id: totalAmount
      - id: status
    merge:
      labels:
        - Order
      attributes:
        - orderId

  - id: productNode
    conditions:
      contains:
        - productIds
    labels:
      - Product
    iterator:
      id: productIds
      newId: productId
      type: String
    attributes:
      - id: productId
    merge:
      labels:
        - Product
      attributes:
        - productId

relationships:
  - type: PLACED
    required: True
    srcNode: customerNode
    dstNode: orderNode
    attributes:
      - id: orderDate
        newId: placedAt

  - type: CONTAINS
    srcNode: orderNode
    dstNode: productNode
    conditions:
      contains:
        - productIds

indexes:
  - name: customer_id_index
    label: Customer
    attributes:
      - customerId

  - name: order_id_index
    label: Order
    attributes:
      - orderId

  - name: product_id_index
    label: Product
    attributes:
      - productId

Best Practices

  1. Define indexes for merge attributes — Any attribute used in a merge block should have an index to ensure fast lookups.

  2. Use meaningful node IDs — The id field in node definitions is used to reference nodes in relationships. Use descriptive names like customerNode rather than node1.

  3. Validate field presence — Use contains conditions when a field might be missing or null in some records.

  4. Alias for clarity — Use newId to give attributes clear, consistent names in your graph, regardless of source column names.

  5. Start simple — Begin with basic node and relationship definitions, then add conditions and merge logic as needed.

Next Steps

See the CLI Reference to learn how to run your mapping files, or the Configuration File guide to set up your database connection.