Creating sample data for Nonprofit Cloud with CumulusCI and Claude (or the LLM of your choice)

A couple of weeks ago, I shared my workflow for generating volunteer data using Snowfakery to build the structure, CumulusCI to extract it to SQL, and ChatGPT to make it realistic. That approach worked, but I’ve since made some significant improvements:
What’s gone: Snowfakery recipes
What’s new: Custom CumulusCI tasks that extract exactly what an LLM needs to understand Nonprofit Cloud’s data models and generate both mapping files and realistic data directly.
The result? Fewer steps and fewer iterations to get working data, and a workflow that’s easier to repeat across different functional areas. Although I’ve focused my attention on Nonprofit Cloud, the process would work the same way for any Salesforce data model.
A note about LLMs: I switched from ChatGPT to Claude for this round, though I used a bit of both when I was running up against my session limits. Both LLMs generally handled the tasks well enough but required a fair amount of handholding to get from an initial set of files to ones that actually worked. Claude does have one nice advantage: when you provide an error message it will automatically correct the related file in place; ChatGPT tends to give you the text of the correction and instructions on where in the file to paste it.
After a recent run, I did a retrospective with Claude to ask what inputs and prompts would have helped it get closer to the correct answer the first time. I did more data generation runs using those improvements and continued to iterate. The refined process is what I’m documenting here. I’ll continue to update the linked prompt as I discover new refinements.
Nonprofit Cloud’s data models are complex. Understanding how objects relate — how a Gift Transaction connects to a Commitment, which connects to a Person Account, which connects to a Party Relationship Group — is much easier when you have concrete examples to explore.
But manually creating hundreds of interconnected records is tedious and error-prone. And generic fake data doesn’t tell convincing stories about real nonprofit operations.
This workflow lets you generate complete, realistic datasets that demonstrate actual nonprofit scenarios: volunteer programs with qualified volunteers, fundraising campaigns with varied giving patterns, case management with complete service delivery lifecycles.
This also makes it much easier to switch which story you want to tell once you have a working mapping and initial. Say you have a volunteer management narrative built around an animal shelter and now you want to switch to a food bank. You can ask Claude to do that work for you. (I don’t want to say trivial until I’ve tried it and seen how much iterating it takes, but it might be trivial. That experiment might make for a fun follow-up post.)
Here’s the high-level process:
This workflow uses custom CumulusCI tasks I’ve built specifically for this purpose. Start by cloning the repository and navigating into it:
git clone https://github.com/Sundae-Shop-Consulting/npc-exploration.git
cd npc-exploration
This repository includes the custom tasks for extracting metadata and several example datasets you can explore.
Salesforce’s Data Model Gallery includes comprehensive Entity Relationship Diagrams (ERDs) for each functional area. These diagrams show exactly which objects are involved and how they relate to each other.
For example, the Fundraising ERD shows:
Take a screenshot of the relevant ERD; this becomes your blueprint for the dataset scope.
Before extracting metadata, you’ll need a scratch org with Nonprofit Cloud installed:
cci org scratch dev myorg
cci flow run dev_org --org myorg
The dev_org flow enables Nonprofit Cloud features and any dependencies, giving you a clean environment with all the objects you’ll need to reference. If you’re using this process outside of Nonprofit Cloud, you’ll want to modify the scratch org definition and dev_org flow to match your requirements. Note that dev_org is a standard CumulusCI flow that I’ve extended with my own tasks.
ERDs typically use user-friendly labels (“Gift Transaction”) rather than API names (“GiftTransaction”). To help the LLM translate these, generate a complete list of objects from your org:
cci task run list_org_objects --org myorg --output_path ~/Desktop/org_objects.txt
This is a custom cci task that calls custom python that is included as part of the npc-exploration repo.
This outputs every object in your org with both its label and API name:
List of objects in org
=======================
AIApplication (AI Application) [standard]
AIApplicationConfig (AI Application config) [standard]
AIInsightAction (AI Insight Action) [standard]
AIInsightFeedback (AI Insight Feedback) [standard]
AIInsightReason (AI Insight Reason) [standard]
AIInsightValue (AI Insight Value) [standard]
AIPredictionEvent (AI Prediction Event) [standard]
AIRecordInsight (AI Record Insight) [standard]
AcceptedEventRelation (Accepted Event Relation) [standard]
Account (Account) [standard]
...
You’ll provide this list to the LLM along with the ERD screenshot, and it will figure out which API names correspond to the objects shown in the diagram.
Give the LLM a prompt like:
I'd like a comma-separated list of object API names for all objects in the attached ERD. The attached object list will provide the API names that correspond to the labels in the diagram. Follow this list precisely; do not guess at API names.
Once the LLM identifies the correct API names from the ERD, you’ll extract comprehensive metadata for those objects. The list after ---objects is pasted from the results from your LLM. --output_path is wherever you want the results. Since this is just more reference info for your LLM, it’s not critical that you save this as part of your project.
cci task run generate_object_reference --org myorg \
--objects "Account,Contact,AccountContactRelation,Campaign,OutreachSourceCode,PaymentInstrument,GiftBatch,GiftDesignation,Opportunity,GiftCommitment,GiftCommitmentSchedule,GiftDefaultDesignation,GiftCmtChangeAttrLog,GiftTransaction,GiftTransactionDesignation,GiftRefund,GiftSoftCredit,GiftTribute,GiftEntry,Task" \
--output_path ~/Desktop/fundraising_objects.txt
This generates a YAML file with comprehensive metadata for each object:
objects:
- name: Account
fields:
- name: Name
type: string
access: read_write
is_required: false
- name: RecordTypeId
type: reference
access: read_write
is_required: false
reference:
reference_to:
- RecordType
- name: PersonMailingStateCode
type: picklist
access: read_write
picklist:
values_first:
- Alabama
- Alaska
- Arizona
values_total: 384
The critical information this captures:
is_required: true)This object reference becomes the foundation for generating valid mapping files and realistic data.
Now you have everything the LLM needs: the ERD for understanding relationships and the object reference for technical constraints. Provide these to your LLM along with a scenario prompt:
Using the attached ERD and object reference, create a
CumulusCI mapping file and SQL dataset for fundraising scenarios with:
- 7 campaigns (annual fund, major gifts, special events, planned giving)
- 102 individual donors with varied giving patterns
- Multiple gift types (one-time, recurring, pledges)
- Proper soft credits and tribute gifts
- Realistic names, addresses, and donation amounts
You can also draw from other resources instead of providing your own scenarios. I’ve been providing the Nonprofit Cloud Developer Guide as an input and asking for examples that follow that guide.
Using the attached ERD, object reference, and NPC developer guide, create a
CumulusCI mapping file and SQL dataset that highlights fundraising features.
To reduce the need for a lot of back-and-forth troubleshooting, copy and paste these instructions to your LLM, either as part of the above prompt or in the project instructions:
https://gist.github.com/lmeerkatz/53c613887d1c7f1b92fc2309fa0ebede
Whenever I run into another error that seems likely to pop up again, I add to those instructions so I can get closer to creating a working dataset in one shot.
The LLM generates both files:
A CumulusCI mapping file (fundraising_mapping.yml) that defines:
A SQL file (fundraising_data.sql) with realistic data that:
Because the object reference includes actual picklist values, required field information, and lookup relationships, the LLM generates data that’s much more likely to load successfully on the first try.
Once you have your mapping and SQL files, loading the data is straightforward …
cci task run load_dataset \
--mapping datasets/fundraising/fundraising_mapping.yml \
--sql_path datasets/fundraising/fundraising_data.sql \
--org myorg
… but it may not work the first time. As you get error messages in your terminal, paste them back into your LLM and it will likely be able to provide corrections without additional context.
✔ ERDs provide the right scope
Starting with Salesforce’s official ERDs ensures you’re including all the objects you need and understanding their relationships correctly.
✔ No manual API name lookup
The LLM matches diagram labels to API names automatically using your org’s object list.
✔ No Snowfakery wrestling
You’re not fighting with YAML syntax or trying to coax Snowfakery into generating data that passes Nonprofit Cloud’s validation rules.
✔ Fewer iterations
The object reference gives the LLM everything it needs to generate valid data structures on the first attempt — or least fewer attempts than it required before.
✔ Better realism
Because you control the scenario prompt, you can create datasets that tell specific nonprofit stories.
✔ It’s reusable
Once you have a mapping file and SQL file, you can load that exact dataset into any scratch org or sandbox repeatedly.
✔ It’s shareable
Push your mapping and SQL to GitHub and your whole team can use the same sample data—perfect for training, demos, or collaborative exploration.
I’ve added these custom tasks to my npc-exploration repository to make this workflow possible:
list_org_objects:
description: List API names and labels of every object in the org
group: NPC Metadata Utilities
class_path: tasks.list_org_objects.ListOrgObjects
generate_object_reference:
description: Generate a text data dictionary for selected objects.
group: NPC Metadata Utilities
class_path: tasks.object_reference.GenerateObjectReference
I’ve used this workflow to create five complete datasets for Nonprofit Cloud: Fundraising, Household & Group Memberships, Case Management, Grantmaking, and Volunteer Management (and more have probably been added since I last updated this post!).
Those datasets, the custom CCI tasks, and comprehensive documentation are available in the npc-exploration repository. Each dataset includes both technical implementation files and user-friendly wiki documentation.
Fair warning: This workflow requires command-line comfort and some familiarity with CumulusCI. I’m working on some additional resources to help folks navigate the initial setup.
If you have scenarios you’d like to see demonstrated or ideas for improving this workflow I’d love to hear them!