Character.AI’s generative AI systems use a variety of data sources, including publicly available data; text content and interaction data from users; internally generated data from safety training, engineering testing, and red-teaming exercises; synthetic data created internally or from open source repositories, and; third parties through commercial agreements. Please see our Terms of Service, Privacy Policy, and Regional Privacy Disclosures for further information about our services and data use and collection practices.
We started collecting and using data for model development in 2021. We also regularly use data (including large-scale datasets) as part of what we call “post-training,” which is a process of customizing, fine-tuning, and improving existing models. These activities help us improve the accuracy and functionality of our models.
Some of the data we use may include personal information as defined in California Civil Code Section 1798.140(v), de-identified data as defined in California Civil Code Section 1798.140(m), and aggregate consumer information as defined in California Civil Code Section 1798.140(b). We use tools to process and filter such data as part of our post-training and other exercises.
This documentation is provided pursuant to California Civil Code Section 3111 (AB 2013) and may be updated from time to time.