Appearance
Data Migration and Cleanup β
Your CRM is only as good as the data inside it. Migrating dirty data into a shiny new system just gives you a shiny new system full of garbage β invest in cleanup before migration, not after.
Why This Matters β
- π’ Owner: Bad data costs businesses an estimated 15-25% of revenue through missed opportunities, duplicated outreach, and flawed forecasting. Data migration is your one chance to start clean.
- π» Dev: You will build the migration scripts, field mappings, and validation rules. Understanding the data landscape upfront prevents weeks of rework when production data behaves nothing like test data.
- π PM: Data migration is the most commonly underestimated workstream in CRM projects. If you do not plan for it explicitly, it will blow your timeline.
- π¨ Designer: The data model determines what you can display and how. Inconsistent or missing data means empty fields, broken layouts, and user frustration on day one.
The Concept (Simple) β
Think of data migration like moving to a new house.
You would not pack every item you own β including the broken furniture, expired food, and boxes of junk from the garage β load it all into a truck, and dump it in your new home. You would sort first. Keep what is valuable, toss what is trash, fix what is worth repairing, and organize everything before it goes into the new space.
Data migration works the same way. Before you move a single record to your new CRM, you audit what you have, throw away duplicates and dead records, standardize formats, and map everything to its new home. The move itself is the easy part. The preparation is where the real work happens.
In one sentence: Clean your data before you migrate it, because a new CRM will not fix old data problems β it will amplify them.
How It Works (Detailed) β
The Data Migration Lifecycle β
Data migration is not a single event β it is a structured process with distinct phases.
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β AUDIT βββββΆβ CLEAN βββββΆβ MAP βββββΆβ MIGRATE βββββΆβ VALIDATE β
β β β β β β β β β β
β Inventoryβ β Dedupe β β Old βββΆ β β Extract β β Spot β
β all data β β Purge β β New β β Transformβ β check β
β sources β β Normalizeβ β fields β β Load β β Reconcileβ
β Assess β β Enrich β β Define β β (ETL) β β User β
β quality β β β β rules β β β β acceptanceβ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β β
ββββββββββββββ Iterate until quality gates pass βββββββββββββββββPhase 1: Data Audit and Inventory β
Before cleaning anything, you need to know what you have. The audit answers four questions:
- Where does data live? (Spreadsheets, old CRM, email, ERP, sticky notes)
- How much data is there? (Record counts by type)
- How good is it? (Completeness, accuracy, freshness)
- Who owns it? (Which team or person is the source of truth)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA INVENTORY MATRIX β
ββββββββββββββββββ¬ββββββββββ¬βββββββββββββ¬ββββββββββββ¬βββββββββββββ€
β DATA SOURCE β RECORDS β COMPLETE % β FRESHNESS β OWNER β
ββββββββββββββββββΌββββββββββΌβββββββββββββΌββββββββββββΌβββββββββββββ€
β Old CRM β 45,000 β 62% β Mixed β Sales ops β
β Spreadsheets β 12,000 β 35% β Stale β Various β
β Email contacts β 28,000 β 20% β Recent β Individual β
β ERP system β 8,500 β 85% β Current β Finance β
β Business cards β 3,000 β 40% β Stale β Sales reps β
ββββββββββββββββββΌββββββββββΌβββββββββββββΌββββββββββββΌβββββββββββββ€
β TOTAL β 96,500 β β β β
β After dedupe β ~38,000 β (estimate) β β β
ββββββββββββββββββ΄ββββββββββ΄βββββββββββββ΄ββββββββββββ΄βββββββββββββData quality scoring: Rate each data source on four dimensions:
| Dimension | Definition | Score 1 (Bad) | Score 5 (Good) |
|---|---|---|---|
| Completeness | Are required fields populated? | < 30% of fields filled | > 90% of fields filled |
| Accuracy | Is the data correct and current? | Many bounced emails | Verified within 6 months |
| Consistency | Are formats and values standardized? | Free-text everything | Picklists, standard formats |
| Uniqueness | How many duplicates exist? | > 30% duplicate rate | < 5% duplicate rate |
Phase 2: Deduplication Strategies β
Duplicates are the most common and most damaging data quality problem in CRM. They cause reps to call the same prospect twice, reports to overcount pipeline, and automation to fire multiple times.
DEDUPLICATION DECISION TREE
============================
Two records look similar
β
βΌ
βββββββββββββββββββ NO βββββββββββββββββββ
β Same email OR βββββββββββββΆβ Same company + β
β same phone? β β same name (fuzzy β
ββββββββββ¬βββββββββ β match > 85%)? β
β YES ββββββββββ¬ββββββββββ
βΌ β YES
βββββββββββββββββββ ββββββββΌβββββββββββ
β DEFINITE DUPE β β PROBABLE DUPE β
β Auto-merge β β Flag for human β
β β β review β
βββββββββββββββββββ βββββββββββββββββββMerge rules β which record wins?
| Field | Merge Rule |
|---|---|
| Name | Most complete version |
| Keep all; mark primary | |
| Phone | Keep all; mark primary |
| Address | Most recently updated |
| Company | Most complete; standardize format |
| Owner | Most recent activity |
| Activity history | Merge all activities into surviving record |
| Custom fields | Non-blank value wins; flag conflicts for review |
Phase 3: Field Mapping β
Field mapping translates the data model of your old system into the data model of your new CRM. This is where subtle mismatches cause the biggest problems.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FIELD MAPPING TABLE β
ββββββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββββββββ€
β OLD SYSTEM FIELD β NEW CRM FIELD β TRANSFORMATION β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β CustomerName β Account Name β Trim whitespace, title β
β β β case β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β ContactPerson β First Name + β Split on space; last β
β β Last Name β word = Last Name β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β Tel β Phone β Format: +1 (XXX) XXX- β
β β β XXXX β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β Status (free text) β Stage (picklist) β Map to: Prospect, Lead, β
β β β Opportunity, Customer β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β Notes β Description β Truncate to 32,000 char β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β SalesRep (name) β Owner (user ID) β Lookup against user β
β β β table β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β (no equivalent) β Lead Source β Default: "Legacy Import"β
ββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββ€
β Region code β (no equivalent) β Archive to custom field β
β β β or discard β
ββββββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββββββCommon mapping pitfalls:
- One field in the old system maps to multiple fields in the new (e.g., full name to first + last)
- Picklist values do not match (e.g., "Hot Lead" vs "Qualified")
- Data types change (e.g., free text to structured picklist)
- Required fields in the new system have no source data
Phase 4: Normalization and Standardization β
Before loading data, apply consistent formatting rules.
| Data Type | Standardization Rule |
|---|---|
| Phone numbers | E.164 format: +1234567890 |
| Email addresses | Lowercase, trim whitespace |
| Company names | Remove Inc/LLC/Ltd variations; standardize casing |
| Addresses | USPS or postal service standard format |
| Dates | ISO 8601: YYYY-MM-DD |
| Currency | Store as cents/integer; separate currency code |
| Names | Title case; trim whitespace; remove honorifics |
| State/Province | Two-letter abbreviation codes |
Phase 5: Cutover Planning and Rollback β
The actual migration event β the cutover β needs to be planned like a surgical operation.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CUTOVER TIMELINE β
βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ€
β T-14 days β Final test migration on staging β
β β Validate record counts and field accuracy β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T-7 days β Freeze changes in old system (or capture β
β β delta records) β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T-1 day β Full backup of old system β
β β Full backup of new system (pre-migration) β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T-0 (cutover) β Run production migration scripts β
β β Validate record counts match β
β β Spot-check 50 records across all types β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T+1 day β Key users validate their data β
β β Fix critical issues β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T+3 days β Go/no-go decision on full rollout β
β β If no-go: execute rollback plan β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€
β T+7 days β Decommission old system access (read-only) β
βββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββRollback plan essentials:
- Full backup of the new CRM before migration starts
- Documented steps to restore the pre-migration state
- Clear criteria for when to invoke rollback (e.g., > 5% data loss, critical fields missing)
- The old system remains accessible in read-only mode for at least 30 days
In Practice β
A Real-World Data Migration: 96,000 Records to Clean CRM β
A B2B services company had 96,000 contact records spread across an old CRM, spreadsheets, and email exports. Here is what they found during audit:
- 40% duplicates β the same contacts appeared in multiple sources with different spellings
- 60% of phone numbers were in non-standard formats or disconnected
- 25% of email addresses bounced when validated
- "Status" field had 47 unique free-text values that needed to map to 6 picklist options
After a three-week cleanup effort, they migrated 38,000 clean, deduplicated records. Sales reps reported that the new CRM felt "like it actually knows our customers" because every record had complete, accurate, standardized data.
Anti-Patterns to Avoid β
The "Migrate Everything" Approach β Importing all historical records regardless of quality or relevance. If a contact has not been touched in three years and has no email address, it does not belong in your new CRM.
Cleaning Data After Migration β This always takes three times longer and means users start with a bad experience. Clean first, migrate second.
Skipping Test Migrations β Running the migration script once in production and hoping for the best. Always run at least two full test migrations against realistic data volumes.
One-Person Migration β Assigning data cleanup to one analyst working alone. Business users must validate that their data looks correct β the analyst cannot know that "Acme Corp" and "ACME Corporation" are the same company without domain knowledge.
Ignoring Delta Records β Records created or modified between the test migration and production cutover get lost. Plan for a delta migration or enforce a data freeze window.
Key Takeaways β
- Audit all data sources before cleaning β you cannot fix what you have not measured.
- Deduplication typically reduces record counts by 30-50%. This is a feature, not a bug.
- Field mapping is where most migration bugs originate. Document every transformation rule explicitly.
- Normalize data formats (phone, email, dates, addresses) before migration, not after.
- Run at least two test migrations with production-scale data before the real cutover.
- Always maintain a rollback plan and keep the old system accessible in read-only mode.
- Budget 2-4 weeks for data cleanup β it is the most underestimated phase in every CRM project.
- Business users must validate migrated data. Technical teams alone cannot verify accuracy.
Action Items β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ROLE-BASED ACTION ITEMS β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π’ Owner β β Mandate a data quality audit before any migration β
β β work begins β
β β β Approve the data retention policy β what migrates β
β β and what gets archived β
β β β Allocate budget for data enrichment tools if β
β β needed (email validation, firmographic data) β
β β β Set quality targets: < 5% duplicates, > 85% β
β β field completeness β
ββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π» Dev β β Build the field mapping document with β
β β transformation rules for every field β
β β β Write migration scripts with validation checks β
β β and error logging β
β β β Run two full test migrations before production β
β β β Create rollback scripts and test them β
ββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π PM β β Build the data inventory matrix with record β
β β counts and quality scores β
β β β Schedule the cutover window and communicate it β
β β to all stakeholders β
β β β Define the go/no-go criteria for post-migration β
β β β Plan the delta migration strategy for late changes β
ββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π¨ Designβ β Review migrated data in the new CRM layouts to β
β β verify fields render correctly β
β β β Design empty-state experiences for records with β
β β incomplete data β
β β β Plan how "Legacy Import" records will display β
β β differently from newly created records β
β β β Test dashboards with migrated data volumes β
ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ