Rodeo Mapping Files Guide¶
Mapping files define how Rodeo transforms source data into graph nodes and relationships. This guide provides a complete reference for creating mapping files.
Overview¶
A mapping file is a YAML document that tells Rodeo:
- Which fields to extract from your source data
- How to create nodes with labels and attributes
- How to create relationships between nodes
- When to create or merge entities based on conditions
- Which indexes to create for query performance
Basic Structure¶
A mapping file has three required top-level fields and two optional fields:
name: my-mapping # Required: unique name for this mapping
fields: # Required: field definitions from source data
- id: fieldId
key: SourceFieldName
type: String
nodes: # Required: node definitions
- id: nodeId
labels:
- MyLabel
attributes:
- id: fieldId
relationships: # Optional: relationship definitions
- type: RELATES_TO
srcNode: nodeId1
dstNode: nodeId2
indexes: # Optional: index definitions
- name: my_index
label: MyLabel
attributes:
- fieldId
Fields¶
The fields section defines which data to extract from your source file and how to handle it. Each field maps a source column to an internal identifier used throughout the rest of the mapping.
Basic Field Definition¶
fields:
- id: userId # Internal identifier used in nodes/relationships
key: user_id # Column name in source CSV or field name in source JSON
type: String # Data type in source
Supported Data Types¶
| Type | Description |
|---|---|
String |
Text data |
Number |
Numeric data (integers or decimals) |
Datetime |
Date and time values |
List |
Array of values (support in progress) |
Dictionary |
Nested key-value pairs (support in progress) |
Type Conversion¶
When source data is stored as one type but should be treated as another, use convertType:
fields:
# String field containing a date
- id: createdAt
key: created_at
type: String
convertType: Datetime
# String field containing a number
- id: amount
key: total_amount
type: String
convertType: Number
# Numeric timestamp to datetime
- id: timestamp
key: unix_timestamp
type: Number
convertType: Datetime
Nodes¶
The nodes section defines how to create graph nodes from your source data.
Basic Node Definition¶
nodes:
- id: userNode # Internal identifier for this node definition
labels: # One or more Neo4j labels
- User
- Person
attributes: # Fields to include as node properties
- id: userId
- id: email
- id: createdAt
Attribute Aliasing¶
Rename a field when storing it as a node attribute using newId:
nodes:
- id: userNode
labels:
- User
attributes:
- id: userId
- id: email
newId: emailAddress # Stored as "emailAddress" on the node
- id: createdAt
newId: memberSince
Conditions¶
Control when nodes are created using conditions. Conditions support two types of checks: comparisons against values and presence checks.
Comparison Conditions¶
Compare a field's value against a literal value:
nodes:
- id: premiumUser
conditions:
comparisons:
- field: accountType
op: eq
value: "premium"
labels:
- User
- PremiumUser
attributes:
- id: userId
Supported comparison operators:
| Operator | Description |
|---|---|
eq |
Equal to |
neq |
Not equal to |
gt |
Greater than |
gte |
Greater than or equal to |
lt |
Less than |
lte |
Less than or equal to |
Presence Conditions¶
Check if a field exists and is non-null:
nodes:
- id: verifiedUser
conditions:
contains:
- verificationDate # Only create node if this field is present
labels:
- User
- Verified
attributes:
- id: userId
- id: verificationDate
Combined Conditions¶
Use both comparison and presence conditions together. All conditions must be met:
nodes:
- id: activeAdmin
conditions:
comparisons:
- field: role
op: eq
value: "admin"
- field: loginCount
op: gte
value: 1
contains:
- lastLoginDate
labels:
- User
- Admin
attributes:
- id: userId
- id: role
- id: lastLoginDate
Merge Behavior¶
By default, Rodeo creates new nodes. To update existing nodes instead of creating duplicates, define merge criteria:
nodes:
- id: userNode
labels:
- User
attributes:
- id: userId
- id: email
- id: name
- id: lastLogin
merge:
labels:
- User
attributes:
- userId # Match on userId to find existing nodes
When a merge is defined, Rodeo will:
- Search for existing nodes matching the specified labels and attributes
- Update the existing node if found
- Create a new node if no match exists
You can merge on multiple attributes for composite uniqueness:
Iterators¶
When a field contains a list, use an iterator to create a node for each item in the list:
fields:
- id: tagList
key: tags
type: List
nodes:
- id: tagNode
conditions:
contains:
- tagList
labels:
- Tag
iterator:
id: tagList # The list field to iterate over
newId: tagName # Attribute name for each item
type: String # Type of each item in the list
attributes:
- id: tagName # Use the iterator's newId here
- id: documentId
This creates one Tag node for each item in the tagList array.
Relationships¶
The relationships section defines connections between nodes.
Basic Relationship Definition¶
relationships:
- type: WORKS_FOR # Relationship type in Neo4j
srcNode: employeeNode # Source node id (from nodes section)
dstNode: companyNode # Destination node id (from nodes section)
Relationship Attributes¶
Add properties to relationships:
relationships:
- type: PURCHASED
srcNode: customerNode
dstNode: productNode
attributes:
- id: purchaseDate
- id: quantity
- id: price
newId: purchasePrice
Required Relationships¶
Mark a relationship as required when both source and destination nodes must exist:
When required: True and either the source or destination node doesn't exist (due to failed conditions), the record is skipped. Processing continues with remaining records.
Relationship Conditions¶
Apply conditions to control when relationships are created:
relationships:
- type: MANAGES
srcNode: managerNode
dstNode: employeeNode
conditions:
comparisons:
- field: managerLevel
op: gte
value: 2
contains:
- departmentId
attributes:
- id: departmentId
Relationship Merge¶
Merge relationships to avoid duplicates:
relationships:
- type: FOLLOWS
srcNode: userNode
dstNode: followedUserNode
attributes:
- id: followDate
- id: userId
newId: followerId
merge:
attributes:
- followerId
Indexes¶
The indexes section defines indexes to create before processing. Indexes improve query performance and are especially important for merge operations.
Node Indexes¶
Create an index on node attributes:
Multi-Attribute Indexes¶
Create composite indexes on multiple attributes:
Relationship Indexes¶
Create an index on relationship attributes:
indexes:
- name: purchase_index
type: PURCHASED # Relationship type (use 'type' instead of 'label')
attributes:
- purchaseDate
- purchasePrice
Complete Example¶
Here's a complete mapping file demonstrating multiple features:
name: ecommerce-orders
fields:
- id: orderId
key: order_id
type: String
- id: customerId
key: customer_id
type: String
- id: customerEmail
key: email
type: String
- id: orderDate
key: order_date
type: String
convertType: Datetime
- id: totalAmount
key: total
type: String
convertType: Number
- id: status
key: order_status
type: String
- id: productIds
key: product_ids
type: List
nodes:
- id: customerNode
labels:
- Customer
attributes:
- id: customerId
- id: customerEmail
newId: email
merge:
labels:
- Customer
attributes:
- customerId
- id: orderNode
conditions:
comparisons:
- field: totalAmount
op: gt
value: 0
contains:
- orderDate
labels:
- Order
attributes:
- id: orderId
- id: orderDate
- id: totalAmount
- id: status
merge:
labels:
- Order
attributes:
- orderId
- id: productNode
conditions:
contains:
- productIds
labels:
- Product
iterator:
id: productIds
newId: productId
type: String
attributes:
- id: productId
merge:
labels:
- Product
attributes:
- productId
relationships:
- type: PLACED
required: True
srcNode: customerNode
dstNode: orderNode
attributes:
- id: orderDate
newId: placedAt
- type: CONTAINS
srcNode: orderNode
dstNode: productNode
conditions:
contains:
- productIds
indexes:
- name: customer_id_index
label: Customer
attributes:
- customerId
- name: order_id_index
label: Order
attributes:
- orderId
- name: product_id_index
label: Product
attributes:
- productId
Best Practices¶
-
Define indexes for merge attributes — Any attribute used in a
mergeblock should have an index to ensure fast lookups. -
Use meaningful node IDs — The
idfield in node definitions is used to reference nodes in relationships. Use descriptive names likecustomerNoderather thannode1. -
Validate field presence — Use
containsconditions when a field might be missing or null in some records. -
Alias for clarity — Use
newIdto give attributes clear, consistent names in your graph, regardless of source column names. -
Start simple — Begin with basic node and relationship definitions, then add conditions and merge logic as needed.
Next Steps¶
See the CLI Reference to learn how to run your mapping files, or the Configuration File guide to set up your database connection.