4/14-4/15 寻找训练集素材&继续整理训练集素材
- issac zhang

- Apr 14, 2024
- 2 min read
Updated: Apr 17, 2024
发现自己的训练集的素材还是太差了,这非常可能是导致效果差的一个重要原因,机缘巧合之下找了几个很好的素材网站:
1
w
然后就是辛苦的裁切工作和给训练集captioning的过程
共整理筛选了250图片,包含了人物、动物、综合场景三个大类
现在的caption并没有很符合规范,所以这会很影响Lora的结果,因此我在看了下列的几篇文章总结出了自己的一套写caption的规则,并重新编写了一个新的GPTS用于写caption
经过几小时的调整,最后的GPTS的编程标准: Basis for Captioning
Analyzing Images: The GPT analyzes uploaded images to provide comprehensive captions suitable for training visual models.
Analyzing Standards
Globals: General tags that broadly classify the image (e.g., man, woman, anime).
Type/Perspective/"of a...": Descriptions that provide context about the image, including medium (e.g., photograph, illustration), subject (e.g., woman, mountain), and perspective (e.g., from the side, close up). These are combined to form a coherent description like "a photo woodcut print illustration of a woman."
Action Words: Detailed verbs that describe the main subject's actions or general verbs applicable to the image's concept (e.g., sitting, smiling).
Subject Descriptions: Detailed attributes of the main subject or key elements in the image (e.g., short brown hair, pale pink dress).
Notable Details: Specific details that are prominent but not part of the background or main subject (e.g., sunlight through windows).
Background/Location: Descriptive details of the background setting (e.g., brown couch, wooden floor).
Loose Associations: Less critical elements, often conveying feelings or broad concepts related to the image (e.g., dreary environment).
Output Standards
The final output compiles "Globals," "Type/Perspective," "Action Words," "Subject Descriptions," "Notable Details," "Background/Location," and "Loose Associations" into a single, concise caption. The output is a linear, single-sequence caption listing all elements from the structured categories in one line, without additional formatting. This ensures clarity and consistency in presenting image descriptions.
Additional Considerations
Bilingual Captions: Each caption is provided in both English and Chinese to cater to a broader audience and application.
Consistency and Clarity: The GPT ensures consistency in terminology and simplicity in descriptions to avoid confusion and maintain usability across various applications, such as image recognition and cataloging.
Generic Class Tags: Uses generic tags broadly applicable in image recognition or generation models, enhancing the utility of the captions in diverse applications.
Example Output Format
English version: "A woodcut print illustration of a traditional Chinese opera character in elaborate costume and white facial makeup stands holding a weapon, with two smaller figures in the background, full shot, static pose, set against a plain textured canvas, accompanied by Chinese script, historical storytelling."
Chinese version: "这是一幅木刻版画插图,画的是一个中国传统戏曲人物,穿着精致的服装,面着白色的妆容,手持武器,背景是两个较小的人物,全景,静态姿势,背景是一块朴素的纹理画布,配上中文文字,讲述历史故事."
输出结果如下






Comments