Inventory Source Presents at Big Data Jax on Moving and Syncing Big Data in Ecommerce

Inventory Source’s CEO Kelly Dyer and developers Matt Myers and Brad Hamilton had the opportunity this week to present at Big Data Jax, sponsored by NLP Logix, on the topic of Moving and Syncing Big Data for Ecommerce.

Check out the full presentation below!

1. Overview of the Ecommerce Landscape

  • Traditional ecommerce supply chain involves buying, warehousing, and shipping products to customers.
  • Advances in technology and logistics have disrupted this model, leading to exponential changes.
  • Introduction of new models like dropshipping and third-party logistics (3PL).

2. What is Dropshipping?

  • Definition: A retail model where the seller doesn’t own or warehouse inventory.
  • Products are directly shipped from a distributor to the customer.
  • Seller lists the product, sells it, and places the order with the distributor only after making a sale.
  • Examples: Walmart uses dropshipping extensively.

3. Third-Party Logistics (3PL)

  • The sellers can ship products to providers like Amazon, which handles storage and fulfillment.
  • Example: Fulfillment by Amazon (FBA) leverages Amazon’s logistics network for efficient delivery.

4. Challenges in Dropshipping and Multi-Channel Retail

  • Inventory Complexity: Products can be in multiple locations:
    • Own warehouse.
    • Distributors’ warehouses (virtual inventory).
    • Third-party fulfillment centers like Amazon.
  • Virtual Inventory Risks:
    • Inventory is shared among multiple sellers; stock might sell out before the order is placed.
    • Prices may fluctuate without notice.

5. The Role of Data in Ecommerce

  • Inventory management in dropshipping is a data challenge:
    • Real-time synchronization of inventory across multiple sources is critical.
    • Changes in stock levels and pricing must be tracked continuously.
  • Complex Data Sources: Suppliers use different data formats, including:
    • Emails, faxes, XML feeds, APIs, CSV files, and manual inputs.

6. Lack of Standardization

  • Suppliers and distributors often use inconsistent methods for sharing data.
  • Retailers need teams to clean, standardize, and integrate data for use across platforms.
  • Common tools: Excel for manual processing and data integration.

7. Manual Data Management

  • Many businesses rely on manual processes to:
    • Scrub and cleanse supplier data.
    • Sync data with platforms like Shopify, eBay, and Amazon.
  • This creates inefficiencies and bottlenecks in scaling operations.

8. The Role of Inventory Source

  • Positioned as a solution to address data challenges in dropshipping and multi-channel retail.
  • Tools offered by Inventory Source automate the process of:
    • Ingesting supplier data.
    • Syncing inventory across multiple platforms.
  1. Integration Software Overview:
  • Functionality: Inventory Source operates as an integration software, dealing with the standardization of product data from various sources (e.g., CSV, XML, and API feeds).
  • Frequency Management: The software accounts for how often product data updates, ranging from hourly to twice daily, ensuring that eCommerce platforms have the most up-to-date product information.
  1. Wide Integration Across Sales Channels:
  • Supported Platforms: Inventory Source integrates with major online sales platforms such as Amazon API, Shopify, Magento, eBay, Walmart, and Sears. The platform is designed to handle product data across these sales channels.
  • Data Standardization: Product data is ingested and standardized around Inventory Source’s product model, which evolves to accommodate new challenges and requirements in eCommerce.
  1. Data Management for E-Commerce:
  • Data Movement and Syncing: The software focuses on automating the movement and synchronization of product data across multiple channels and suppliers. This ensures that product availability, prices, and descriptions are accurate on all platforms.
  • Curating and Managing Product Catalogs: Retailers can manage and curate their product catalogs to align with the different sales channels they operate on.
  1. The Shift in E-Commerce:
  • From Physical Box Management to Data Management: As dropshipping and third-party fulfillment become more common, the eCommerce landscape is evolving from physical inventory management to data management. This shift makes managing data more complex but also opens opportunities for businesses to scale effectively.
  1. Automating Product and Inventory Data:
  • Inventory Synchronization: When a product goes out of stock, it is essential to reflect this on all marketplaces where the product is being sold. The software ensures inventory levels are synced automatically across channels to prevent overselling or selling out-of-stock items.
  • Challenges with Data Formats: Data comes in various formats such as CSV, XML, JSON, etc., and managing these different protocols is a challenge. Inventory Source solves this by building integrations for each data format, which can be reused for all customers.
  1. Tools for Retailers:
  • Product Selection: Retailers are given the ability to filter and select the products they wish to sell, even from large product catalogs (e.g., tens of thousands of products). Speed and filtering criteria are important for selecting the right items.
  • Content Management: Retailers can modify product titles, descriptions, and images, ensuring that product content aligns with their brand or marketing strategy.
  • Pricing Management: Inventory Source allows retailers to manage pricing across multiple suppliers and maintain a margin on products by adjusting prices based on changes in manufacturer costs or supply chain fluctuations.
  • Category Management: Many marketplaces require products to be categorized in specific ways. The software provides tools to help retailers organize their products into categories and sync these categories with the sales platforms.
  1. Integration with Multiple Platforms:
  • Multi-Platform Syncing: Inventory Source integrates with around 50 different platforms, enabling retailers to push product data seamlessly across various marketplaces. This wide integration helps eCommerce businesses streamline their operations and maintain consistency across platforms.
  1. Order Processing and Syncing:
  • Simplified Flow: When a customer purchases a product (e.g., on Amazon), Inventory Source facilitates the process by automatically syncing the order to the supplier.
  • Order Fulfillment: The supplier processes and ships the item, and Inventory Source ensures the shipment details (tracking info, etc.) are automatically synced back to the retailer’s platform.
  • Customer Notifications: The retailer’s platform notifies the customer with shipping details and tracking information.
  1. Focus on Retailers:
  • The main focus is on easing the retailer’s workload, allowing them to focus on customer experience while automating backend tasks like inventory updates, order processing, and shipment tracking.
  1. Data Automation and Analytics:
  • Data Automation: The platform uses data automation tools to ensure inventory, product, and order data remain in sync across multiple platforms without requiring manual intervention.
  • ETL (Extract, Transform, Load): The system extracts data from various suppliers, transforms it into a usable format, and loads it into the retailer’s system, ensuring seamless syncing.
  1. Scale of Operations:
  • Active Products: Inventory Source manages around 10 million active products in its system.
  • Suppliers Integrated: The system integrates with 200 suppliers.
  • Retailer Channels: Over 4,500 retailer channels are synced automatically, ensuring consistency in product and order data across platforms.
  • Daily Product Changes: The platform handles about 2 million data changes daily.
  • Volume of Product Syncing: Last month, the system synced 1.6 billion products across various channels.
  • Order Checkpoints: The system processes approximately 1 million order checkpoints every month, ensuring all data (e.g., order shipments and processing status) is up to date.
  1. Backend Infrastructure and Tech Stack:
  • Server Farms: Dedicated server farms monitor and handle integrations with suppliers to ensure the data is always up to date.
  • Redis Cache: Inventory Source leverages Redis for caching to manage billions of data points per hour, enabling faster syncing and reducing reliance on databases that may struggle with relational data at scale.
    • Scalability: Redis enables horizontal scaling, which means more servers (shards) can be added to handle increasing data load as the business scales.
    • Performance: The use of Redis enhances system speed and data retrieval efficiency, which is critical for syncing product and order data in real time.
  1. Challenges and Solutions:
  • Database Limitations: Given the large scale of transactions and data involved, traditional relational databases might not be fast enough to keep up. Instead, Inventory Source uses Redis to bypass database bottlenecks and ensure smooth data flow.
  • Real-Time Data Syncing: Real-time updates are crucial in eCommerce. Without high-speed syncing tools, retailers would struggle to keep inventory data and order statuses accurate, leading to operational inefficiencies and poor customer experiences.
  1. Centralized Data Hub:
  • Inventory Source centralizes all data processing in a hub to allow for easier and more efficient access to product, inventory, and order information across multiple platforms.
  • This approach helps in managing large amounts of data while reducing delays and errors in syncing across systems.
  1. On-Demand Syncing and Scheduling:
  • Dynamic Syncing Capabilities: The speaker mentions that Inventory Source’s system allows users to schedule on-demand syncs, enabling immediate data updates when necessary. This is particularly useful when users need to update or sync data right away, without waiting for the next scheduled sync.
  • Improvement of System Performance: The system continuously improves and is able to distribute jobs across servers, ensuring efficiency and faster processing.
  1. Frontend Tools for Data Management:
  • User-Friendly Tools: Inventory Source provides tools for users to manage their data efficiently. These tools display supplier information, enabling users to manage product content effectively.
  • Solar Integration: To handle massive data volumes efficiently, Inventory Source utilizes Solar technology. Solar enables fast searching, filtering, and sorting of data, which is vital when dealing with large datasets like 2.6 million products.
  1. Redis for Efficient Data Processing:
  • Caching with Redis: Redis plays a critical role by not persisting unchanged data in the database. This reduces unnecessary transactions, saves resources, and speeds up the overall data processing. Redis helps to identify changes in supplier data, ensuring that only updated information is stored.
  1. Data Ingestion Process:
  • Multiple Sources of Data: Inventory Source ingests data from suppliers and platforms. On the platform side, the focus is mainly on ingesting new orders (e.g., from Amazon) to enable fulfillment.
  • Supplier Data Ingestion: A significant portion of the data comes from suppliers, including inventory data, product pricing, stock status, and detailed product content. This data is standardized into a domain model that can be leveraged across multiple platforms.
  • Importance of Product Content: Unlike many competitors, Inventory Source focuses heavily on product content, including category mapping, product descriptions, and detailed attributes. This ensures a seamless experience for end-users and integrates well with various platforms.
  • Category and Variation Structure: Managing product categories and variations is essential in e-commerce. For example, selling a shirt in multiple sizes and colors requires the system to handle variations correctly to provide a clear shopping experience. Inventory Source pulls variation structures from suppliers, ensuring that the correct products are displayed with relevant options for customers.
  1. Challenges in Product Content Management:
  • Difficulties in Mapping and Centralizing Product Data: Managing and mapping product content from suppliers is a complex process. It requires a central domain structure that can map out to different platforms and accommodate various attributes, especially for complex products like clothing that have many variations.
  • Importance of Categories: Categories are vital for navigation on e-commerce sites, and Inventory Source works to ensure that supplier categories align with platform categories, providing an intuitive shopping experience for users.

Data Volume Management: The transcript emphasizes handling large volumes of data (e.g., millions of products) and ensuring that data is processed quickly and accurately. The integration of technologies like Solar and Redis ensures this happens seamlessly.

Customer Experience: The discussion about product variations and categories highlights the importance of delivering a smooth and efficient customer experience. By properly mapping product variations like size and color, the system can offer a user-friendly interface for consumers to choose the product they want.

Competitive Advantage: Inventory Source’s approach to handling data, especially in product content and variation structures, sets it apart from competitors in the e-commerce space, making it a powerful tool for e-commerce merchants.

28. Actionable Insights:

  1. Focus on Data Quality: Ensuring the accuracy and relevance of data ingested from suppliers is crucial for building a reliable e-commerce business.
  2. Leverage Technology: Tools like Solar and Redis can significantly improve data processing and management efficiency, particularly for businesses handling large amounts of inventory.
  3. Enhance Customer Shopping Experience: Proper categorization and variation management are essential to offering a seamless shopping experience, which in turn boosts customer satisfaction and sales.
  1. Importance of Building Import Variation Structures:
  • Retailers benefit from having a robust variation structure to offer customers a better shopping experience.
  • Variation structure contributes to higher conversion rates, better product representation, and more efficient sales.
  • The system must allow retailers to integrate and leverage this data seamlessly.
  1. Order Acknowledgement and Shipment Tracking:
  • After receiving orders from platforms like Jet, it’s crucial for the system to confirm order receipt and process it efficiently.
  • The process includes providing shipment tracking and ensuring customers are notified promptly.
  • Maintaining smooth communication with retailers and customers enhances the overall buying experience.
  1. ETL (Extract, Transform, Load) System:
  • The system involves robust data workflows for managing multiple integrations efficiently.
  • The core system integrates and maps specific fields from different suppliers, ensuring consistency across platforms.
  • This system ensures product data is accurate, consistent, and ready for use across various platforms.
  1. Handling Product Access and Custom Pricing:
  • Some suppliers restrict access to products based on specific retailer credentials or pricing tiers.
  • For example, certain products like Nike might be restricted to a select group of retailers to maintain brand value.
  • The system enables custom pricing and product access, ensuring that retailers are only able to sell what they are authorized to sell.
  1. Managing Unique Supplier Workflows:
  • Different suppliers may have varying data feed frequencies and formats.
  • The platform adapts to unique workflows, whether it’s receiving real-time API data or daily CSV updates, ensuring all updates are processed efficiently.
  • Custom integrations are designed to handle the specific needs of each supplier.
  1. Infrastructure and Scalability:
  • Inventory Source operates with 15 horizontally scalable servers to handle fluctuating data loads.
  • This ensures that when the load increases (due to more retailers or updates), the infrastructure adapts to prevent service disruptions.
  • The system captures and immediately makes the data available to retailers, enabling fast and efficient operations.
  1. Data Synchronization and Management:
  • The platform efficiently manages the syncing of product data, even in cases where suppliers provide frequent updates or minimal changes.
  • Using efficient methods to monitor differences in data ensures faster processing and less strain on resources.
  • Accurate and real-time data is critical to the success of both retailers and suppliers in the ecommerce space.
  1. Customization and Scheduling of Jobs:
  • The platform offers flexibility in how jobs are scheduled, allowing for different frequencies of updates based on supplier needs.
  • For example, an API connection offers real-time updates, while a CSV upload requires periodic checks.
  • The system ensures that only necessary updates are monitored, reducing unnecessary processing.
  1. Product Feed and Catalog Management:
  • The basic feed from suppliers is ingested and cataloged for the retailer’s use.
  • The system allows retailers to curate their catalog by selecting products that are competitive and profitable.
  • Pricing rules, such as markup percentages and landing costs, are used to help retailers manage pricing and ensure profitability.
  1. Advanced Pricing Rules:
  • Retailers can apply a variety of pricing rules, including fixed markups, category-specific pricing, or landed costs.
  • This flexibility helps retailers stay competitive in an ever-changing market, even when external factors like shipping costs fluctuate.
  • These rules can be applied to individual products, categories, or brands.
  1. Category Mapping:
  • Retailers can reorganize products within their store’s categories, allowing for customized browsing experiences.
  • Category mapping supports deeper organizational structures and personalized content for users.
  • This customization helps streamline the shopping experience for customers by presenting products in a logical, easy-to-navigate way.
  1. Content Management:
  • Retailers can also manage the product content, such as descriptions and images, ensuring that products are properly represented across platforms.
  • Content mapping allows for consistent product representation, improving the customer shopping experience and ultimately increasing conversions.
  1. Challenges with Supplier Categories:
  • Suppliers often provide categories with long, repetitive names (e.g., “dog collars slash something”), making it difficult for retailers to manage.
  • Retailers need to centralize and streamline these categories for easier management and better user experience.
  • Centralization includes categorization, content, titles, descriptions, and more.
  1. Importance of Content Management:
  • Content such as titles, descriptions, images, and videos is crucial for product sales. Many suppliers offer rudimentary or incomplete content (e.g., just part numbers in titles).
  • To enhance the shopping experience, retailers should improve this content by adding images, descriptions, and other media.
  1. Order Management and Visualization:
  • Retailers need tools to visualize and track orders in various stages, especially when dealing with multiple suppliers.
  • Inventory Source provides tools that allow users to view orders that are pending, partially shipped, or awaiting processing. This ensures quick resolutions for split orders.
  1. Handling Big Data in Ecommerce:
  • The data challenge involves handling massive product feeds, ranging from 1,000 to 3 million products.
  • Retailers manage unique product subsets based on supplier relationships and pricing differences (e.g., exclusive access to certain warehouses or discounted prices).
  • Filtering, searching, and sorting through these diverse product sets requires specialized tools to optimize workflow and data analysis.
  1. Category Mapping and Data Customization:
  • Retailers need tools to create specific categories for their own unique subset of products.
  • This includes the ability to see product counts, subcategories, and available inventory, tailored to each retailer’s needs.
  1. Advanced Searching Capabilities:
  • Retailers can perform detailed searches across multiple data points, such as pricing, category, and availability.
  • Inventory Source’s catalog manager allows for filtering through feeds with millions of product variations in real time.
  • A key feature is the ability to save search filters and create tags, ensuring retailers can consistently manage their inventory with high precision.
  1. Speed and Efficiency of Data Management:
  • Performing a search across millions of products is made efficient with Inventory Source’s platform, which allows a high-speed search across large datasets.
  • The search tool leverages Solar, a powerful search platform, to quickly filter and display results based on complex criteria.
  1. Use of Tags and Search Filters:
  • Users can create custom tags and save search criteria for future use. These tags help streamline inventory management and ensure consistency in product offerings.
  • Retailers can combine multiple saved search tags to further refine their product catalogs, all while maintaining high-speed search capabilities.
  1. Technology Behind the Solution:
  • The platform uses Solar for real-time data processing, which helps manage large volumes of data efficiently.
  • The flat data structure provided by Solar allows for faster querying without the need for complex database joins or creating multiple indexes.

50. Pagination Challenges in Databases

  • Issue: Traditional databases struggle with recalculating product counts when filters are applied during searches.
  • Solution: Inventory Source leverages Solr to instantly calculate product counts, significantly reducing processing time and costs.
  • Importance: Enhances user experience by enabling real-time search results.

51. Syncing Data to Platforms

  • Process Overview:
    • Syncing involves fetching updated data, applying dynamic rules (e.g., price markups, category mappings), and pushing it to eCommerce platforms.
    • Inventory Source updates Redis Cache and Solr to keep data current.
  • Dynamic Rules Applied:
    • Custom rules, such as pricing adjustments or product customizations, are applied before syncing.
    • Only new or changed data is synced, reducing redundant updates.
  • Minimizing Redundancy:
    • Platforms like Shopify and Amazon impose API rate limits.
    • Inventory Source minimizes unnecessary updates by identifying differences in data and syncing only what’s necessary.

52. API Rate Limiting and Automation Challenges

  • Problem: Platforms like Shopify have strict rate limits, restricting how quickly data can be updated.
  • Solution:
    • Inventory Source optimizes internal processes to minimize latency, ensuring that the bottleneck is not on their side.
    • Retailers can upgrade platform packages or server configurations to improve throughput.

53. Integration and Configuration

  • Custom Integrations:
    • Inventory Source builds dynamic integrations for platforms like Shopify, BigCommerce, and Amazon.
    • Each integration is tailored to handle platform-specific configurations, such as API tokens and authentication protocols.
  • Horizontal Scaling:
    • With over 4,500 active processes syncing data, Inventory Source scales its servers horizontally to manage workload.
    • This ensures reliability even with a growing number of retailers and platforms.

54. Real-Time Triggering of Updates

  • Supplier Job Completion:
    • Syncing is triggered immediately after supplier data updates.
    • Ensures retailers’ stores are updated with the most accurate inventory and product data.
  • Impact:
    • Reduces delays between supplier updates and retailer data accuracy.
    • Keeps inventory levels consistent across multiple channels.

55. Data Analytics and Future Opportunities

  • Current State:
    • Inventory Source accumulates vast amounts of data but currently focuses on syncing rather than in-depth analytics.
  • Opportunities in Analytics:
    • Market Intelligence:
      • Identifying trending products based on order volumes.
      • Offering retailers insights into high-demand products for improved sales strategies.
    • Pricing Analysis:
      • Providing benchmarks for optimal pricing strategies based on platform and product performance.
    • Channel-Specific Insights:
      • Highlighting which platforms perform best for specific product categories, e.g., hunting supplies selling better on Amazon.
  1. Supplier Integration Challenges
  • Suppliers often restrict publicizing data or integrations, necessitating custom agreements.
  • Exclusive integration options are akin to exclusive web templates—more expensive but tailored.
  • Many clients avoid sharing Inventory Source’s role due to competitive concerns.
  1. Responsible Data Usage
  • Ensures data syndication remains accurate and avoids misuse.
  • The software-as-a-service model charges fees based on integration volume and specific features.
  • Partnerships with sourcing platforms lead to direct referrals.
  1. SEO and Inbound Marketing
  • Inventory Source ranks highly for dropshipping tools due to robust SEO.
  • SEO strategies focus on inbound marketing and content optimization.
  1. Technical Monitoring and Testing
  • Uses tools like dashboards to detect and address failures before they impact users.
  • Preemptive monitoring of data feeds ensures discrepancies (e.g., incorrect file formats) don’t disrupt stores.
  • Employs thresholds to block problematic feeds and maintain system integrity.
  1. Dynamic Data Management
  • Flexible fields for custom attributes are easily integrated.
  • Actionable fields, like surcharge costs or dropshipping fees, are added to the core system when valuable.
  1. Error Handling and Reconciliation
  • Failures occur due to supplier feed changes or API modifications.
  • A robust alert system notifies the dev team of issues.
  • Reconciliation processes keep track of successful and failed data transfers at the product and channel levels.
  1. Scalability
  • Multi-threaded servers handle concurrent jobs efficiently.
  • Scalability decisions depend on job completion thresholds rather than fixed query limits.
  1. Future Plans
  • Plans to develop revenue cycle management for billing, reconciliation, and order information tracking.
  • Upcoming features aim to streamline data flow from retailers to suppliers.
  1. Customer Data Privacy
  • Minimal customer data is shared, focusing only on essential order details.
  • Retailers retain control over their data for further processing.
  1. Infrastructure and Scalability Decisions
  • Server management ensures optimal resource allocation for active jobs.
  • Decisions to scale depend on performance benchmarks and job timelines.

66. Choosing Technology: Solr vs. Elasticsearch

  • Solr as a Legacy Tool
    • Solr was selected not through active decision-making but as an inherited tool.
    • The team acknowledges the challenges of switching due to the high costs and risks associated with replacing a well-functioning system.
    • Solr continues to work effectively despite advancements in distributed platforms like Elasticsearch.
  • Comparative Insights
    • Solr supports distributed setups through sharding and replication.
    • Although more manual in configuration, it is reliable and integrates seamlessly with the existing system.
    • ElasticSearch, while more user-friendly and click-to-deploy, wasn’t seen as necessary for their specific needs.

67. Scalability and Redundancy with Solr

  • Sharding and Replication
    • Solr is configured to support multiple instances and read replicas, ensuring scalability.
    • Load balancers distribute traffic across Solr shards for efficiency.
  • Challenges of Manual Configuration
    • Solr requires additional manual effort for configuration compared to modern tools.
    • This includes tasks like setting thresholds and load balancing.

68. Handling Data Source Failures

  • Automated Fail-Safes
    • Processes are halted if a data source fails to meet thresholds or experiences downtime.
    • This prevents the propagation of bad data through the system.
  • Addressing Failures
    • The team works to resolve issues quickly, often within hours.
    • Failures are typically due to supplier-side server issues, not system errors.
  • Communication and Notifications
    • Tools notify customers of outages, enabling them to mark inventory as “out of stock” temporarily.
    • Reaction protocols depend on the severity of the issue and the supplier’s downtime.

 

69. Data Integrity and Customer Impact

  • Avoiding Bad Data Propagation
    • Emphasis on confirming data accuracy before reintroducing it into the system.
    • Proactive steps are taken to avoid syncing incorrect inventory levels.
  • Customer-Centric Approach
    • Inventory Source provides tools to help customers manage temporary disruptions.
    • Clear communication ensures transparency during outages.

6. Final Takeaways

  • Importance of Proactive Infrastructure
    • The current system, though manual, ensures high reliability and adaptability to changes.
  • Commitment to Accuracy
    • The focus remains on maintaining data integrity, even if it means slowing processes temporarily.
  • Evolving Systems with Practicality
    • While new tools like Elasticsearch offer advantages, the cost-benefit analysis favors continuing with Solr for now.