Creating sample data for Nonprofit Cloud with CumulusCI and Claude (or the LLM of your choice)

An AI-generated image of a woman and a laptop high-fiving

A couple of weeks ago, I shared my workflow for generating volunteer data using Snowfakery to build the structure, CumulusCI to extract it to SQL, and ChatGPT to make it realistic. That approach worked, but I’ve since made some significant improvements:

What’s gone: Snowfakery recipes
What’s new: Custom CumulusCI tasks that extract exactly what an LLM needs to understand Nonprofit Cloud’s data models and generate both mapping files and realistic data directly.

The result? Fewer steps and fewer iterations to get working data, and a workflow that’s easier to repeat across different functional areas. Although I’ve focused my attention on Nonprofit Cloud, the process would work the same way for any Salesforce data model.

A note about LLMs: I switched from ChatGPT to Claude for this round, though I used a bit of both when I was running up against my session limits. Both LLMs generally handled the tasks well enough but required a fair amount of handholding to get from an initial set of files to ones that actually worked. Claude does have one nice advantage: When you provide an error message it will automatically correct the related file in place; ChatGPT tends to give you the text of the correction and instructions on where in the file to paste it.

After the last run, I did a retrospective with Claude to ask what inputs and prompt(s) would have helped it get closer to the correct answer the first time. The refined process is what I’m documenting here.

Why This Matters

Nonprofit Cloud’s data models are complex. Understanding how objects relate — how a Gift Transaction connects to a Commitment, which connects to a Person Account, which connects to a Party Relationship Group — is much easier when you have concrete examples to explore.

But manually creating hundreds of interconnected records is tedious and error-prone. And generic fake data doesn’t tell convincing stories about real nonprofit operations.

This workflow lets you generate complete, realistic datasets that demonstrate actual nonprofit scenarios: volunteer programs with qualified volunteers, fundraising campaigns with varied giving patterns, case management with complete service delivery lifecycles.

This also makes it much easier to switch which story you want to tell once you have a working mapping and initial. Say you have a volunteer management narrative built around an animal shelter and now you want to switch to a food bank. You can ask Claude to do that work for you. (I don’t want to say trivial until I’ve tried it and seen how much iterating it takes, but it might be trivial. That experiment might make for a fun follow-up post.)

The New Workflow

Here’s the high-level process:

Start with an ERD from Salesforce documentation to understand the domain scope
Set up your scratch org using the provided definition to get NPC features enabled
Generate a list of objects from your org so the LLM can match the labels from the diagram to each object’s API name
Generate an object reference with all the metadata an LLM needs to understand relationships and constraints
Prompt an LLM to create both a CumulusCI mapping file and SQL data that respects all validation rules
Load the data into an org with one command

Step 0: Get the Tools

This workflow uses custom CumulusCI tasks I’ve built specifically for this purpose. Start by cloning the repository and navigating into it:

git clone https://github.com/Sundae-Shop-Consulting/npc-exploration.git
cd npc-exploration

This repository includes the custom tasks for extracting metadata and several example datasets you can explore.

Step 1: Start With the ERD

Salesforce’s Data Model Gallery includes comprehensive Entity Relationship Diagrams (ERDs) for each functional area. These diagrams show exactly which objects are involved and how they relate to each other.

For example, the Fundraising ERD shows:

Gift Transaction as the central object
How it connects to Gift Commitment, Campaign, and Account
Supporting objects like Gift Designation, Gift Soft Credit, and Payment Instrument
The relationships between donors, campaigns, and giving patterns

Take a screenshot of the relevant ERD; this becomes your blueprint for the dataset scope.

Step 2: Set Up Your Scratch Org

Before extracting metadata, you’ll need a scratch org with Nonprofit Cloud installed:

cci org scratch dev myorg
cci flow run dev_org --org myorg

The dev_org flow enables Nonprofit Cloud features and any dependencies, giving you a clean environment with all the objects you’ll need to reference. If you’re using this process outside of Nonprofit Cloud, you’ll want to modify the scratch org definition and dev_org flow to match your requirements. Note that dev_org is a standard CumulusCI flow that I’ve extended with my own tasks.

Step 3: Generate Your Org’s Object List

ERDs typically use user-friendly labels (“Gift Transaction”) rather than API names (“GiftTransaction”). To help the LLM translate these, generate a complete list of objects from your org:

cci task run list_org_objects --org myorg --output_path ~/Desktop/org_objects.txt

This is a custom cci task that calls custom python that is included as part of the npc-exploration repo.

This outputs every object in your org with both its label and API name:

List of objects in org
=======================

AIApplication (AI Application) [standard]
AIApplicationConfig (AI Application config) [standard]
AIInsightAction (AI Insight Action) [standard]
AIInsightFeedback (AI Insight Feedback) [standard]
AIInsightReason (AI Insight Reason) [standard]
AIInsightValue (AI Insight Value) [standard]
AIPredictionEvent (AI Prediction Event) [standard]
AIRecordInsight (AI Record Insight) [standard]
AcceptedEventRelation (Accepted Event Relation) [standard]
Account (Account) [standard]
...

You’ll provide this list to the LLM along with the ERD screenshot, and it will figure out which API names correspond to the objects shown in the diagram.

Step 4: Generate an Object Reference

Give the LLM a prompt like:

I'd like a comma-separated list of object API names for all objects in the attached ERD. The attached object list will provide the API names that correspond to the labels in the diagram. Follow this list precisely; do not guess at API names.

Once the LLM identifies the correct API names from the ERD, you’ll extract comprehensive metadata for those objects. The list after ---objects is pasted from the results from your LLM. --output_path is wherever you want the results. Since this is just more reference info for your LLM, it’s not critical that you save this as part of your project.

cci task run generate_object_reference --org myorg \
  --objects "Account,Contact,AccountContactRelation,Campaign,OutreachSourceCode,PaymentInstrument,GiftBatch,GiftDesignation,Opportunity,GiftCommitment,GiftCommitmentSchedule,GiftDefaultDesignation,GiftCmtChangeAttrLog,GiftTransaction,GiftTransactionDesignation,GiftRefund,GiftSoftCredit,GiftTribute,GiftEntry,Task" \
  --output_path ~/Desktop/fundraising_objects.txt

This generates a YAML file with comprehensive metadata for each object:

objects:
- name: Account
  fields:
  - name: Name
    type: string
    access: read_write
    is_required: false
  - name: RecordTypeId
    type: reference
    access: read_write
    is_required: false
    reference:
      reference_to:
      - RecordType
  - name: PersonMailingStateCode
    type: picklist
    access: read_write
    picklist:
      values_first:
      - Alabama
      - Alaska
      - Arizona
      values_total: 384

The critical information this captures:

Required fields (is_required: true)
Field access (read_write, read_only, create_only)
Formula and auto-number fields (which should never be in your mapping)
Lookup relationships with target objects
Picklist values (showing the first few valid options)
Field data types to ensure proper formatting

This object reference becomes the foundation for generating valid mapping files and realistic data.

Step 4: Let the LLM Generate Your Dataset

Now you have everything the LLM needs: the ERD for understanding relationships and the object reference for technical constraints. Provide these to your LLM along with a scenario prompt:

Using the attached ERD and object reference, create a 
CumulusCI mapping file and SQL dataset for fundraising scenarios with:

- 7 campaigns (annual fund, major gifts, special events, planned giving)
- 102 individual donors with varied giving patterns
- Multiple gift types (one-time, recurring, pledges)
- Proper soft credits and tribute gifts
- Realistic names, addresses, and donation amounts

You can also draw from other resources instead of providing your own scenarios. I’ve been providing the Nonprofit Cloud Developer Guide as an input and asking for examples that follow that guide.

Using the attached ERD, object reference, and NPC developer guide, create a CumulusCI mapping file and SQL dataset that highlights fundraising features.

To reduce the need for a lot of back-and-forth troubleshooting, copy and paste these instructions to your LLM, either as part of the above prompt or in the project instructions:

https://gist.github.com/lmeerkatz/53c613887d1c7f1b92fc2309fa0ebede

Whenever I run into another error that seems likely to pop up again, I add to those instructions so I can get closer to creating a working dataset in one shot.

The LLM generates both files:

A CumulusCI mapping file (fundraising_mapping.yml) that defines:
- Which objects to load
- Which fields to include (excluding formulas, rollups, auto-numbers)
- How lookups connect related records
- The correct load order
A SQL file (fundraising_data.sql) with realistic data that:
- Respects all validation rules
- Uses proper picklist values
- Maintains referential integrity
- Tells coherent stories

Because the object reference includes actual picklist values, required field information, and lookup relationships, the LLM generates data that’s much more likely to load successfully on the first try.

Step 5: Load Into Any Org

Once you have your mapping and SQL files, loading the data is straightforward …

cci task run load_dataset \
  --mapping datasets/fundraising/fundraising_mapping.yml \
  --sql_path datasets/fundraising/fundraising_data.sql \
  --org myorg

… but it may not work the first time. As you get error messages in your terminal, paste them back into your LLM and it will likely be able to provide corrections without additional context.

Why This Works Better

✔ ERDs provide the right scope
Starting with Salesforce’s official ERDs ensures you’re including all the objects you need and understanding their relationships correctly.

✔ No manual API name lookup
The LLM matches diagram labels to API names automatically using your org’s object list.

✔ No Snowfakery wrestling
You’re not fighting with YAML syntax or trying to coax Snowfakery into generating data that passes Nonprofit Cloud’s validation rules.

✔ Fewer iterations
The object reference gives the LLM everything it needs to generate valid data structures on the first attempt — or least fewer attempts than it required before.

✔ Better realism
Because you control the scenario prompt, you can create datasets that tell specific nonprofit stories.

✔ It’s reusable
Once you have a mapping file and SQL file, you can load that exact dataset into any scratch org or sandbox repeatedly.

✔ It’s shareable
Push your mapping and SQL to GitHub and your whole team can use the same sample data—perfect for training, demos, or collaborative exploration.

The Custom CCI Tasks

I’ve added these custom tasks to my npc-exploration repository to make this workflow possible:

list_org_objects:
  description: List API names and labels of every object in the org
  group: NPC Metadata Utilities
  class_path: tasks.list_org_objects.ListOrgObjects

generate_object_reference:
  description: Generate a text data dictionary for selected objects.
  group: NPC Metadata Utilities
  class_path: tasks.object_reference.GenerateObjectReference

Try It Yourself

I’ve used this workflow to create five complete datasets for Nonprofit Cloud: Fundraising, Household & Group Memberships, Case Management, Grantmaking, and Volunteer Management (and more have probably been added since I last updated this post!).

Those datasets, the custom CCI tasks, and comprehensive documentation are available in the npc-exploration repository. Each dataset includes both technical implementation files and user-friendly wiki documentation.

Fair warning: This workflow requires command-line comfort and some familiarity with CumulusCI. I’m working on some additional resources to help folks navigate the initial setup.

If you have scenarios you’d like to see demonstrated or ideas for improving this workflow I’d love to hear them!

An even faster way to generate realistic sample data for Salesforce