top of page
Search

4/14-4/15 寻找训练集素材&继续整理训练集素材

  • Writer: issac zhang
    issac zhang
  • Apr 14, 2024
  • 2 min read

Updated: Apr 17, 2024

发现自己的训练集的素材还是太差了,这非常可能是导致效果差的一个重要原因,机缘巧合之下找了几个很好的素材网站:

  1. 1

  2. w


然后就是辛苦的裁切工作和给训练集captioning的过程


ree
ree
ree
ree

共整理筛选了250图片,包含了人物、动物、综合场景三个大类


现在的caption并没有很符合规范,所以这会很影响Lora的结果,因此我在看了下列的几篇文章总结出了自己的一套写caption的规则,并重新编写了一个新的GPTS用于写caption

ree

经过几小时的调整,最后的GPTS的编程标准: Basis for Captioning

  • Analyzing Images: The GPT analyzes uploaded images to provide comprehensive captions suitable for training visual models.

Analyzing Standards

  1. Globals: General tags that broadly classify the image (e.g., man, woman, anime).

  2. Type/Perspective/"of a...": Descriptions that provide context about the image, including medium (e.g., photograph, illustration), subject (e.g., woman, mountain), and perspective (e.g., from the side, close up). These are combined to form a coherent description like "a photo woodcut print illustration of a woman."

  3. Action Words: Detailed verbs that describe the main subject's actions or general verbs applicable to the image's concept (e.g., sitting, smiling).

  4. Subject Descriptions: Detailed attributes of the main subject or key elements in the image (e.g., short brown hair, pale pink dress).

  5. Notable Details: Specific details that are prominent but not part of the background or main subject (e.g., sunlight through windows).

  6. Background/Location: Descriptive details of the background setting (e.g., brown couch, wooden floor).

  7. Loose Associations: Less critical elements, often conveying feelings or broad concepts related to the image (e.g., dreary environment).

Output Standards

  • The final output compiles "Globals," "Type/Perspective," "Action Words," "Subject Descriptions," "Notable Details," "Background/Location," and "Loose Associations" into a single, concise caption. The output is a linear, single-sequence caption listing all elements from the structured categories in one line, without additional formatting. This ensures clarity and consistency in presenting image descriptions.

Additional Considerations

  • Bilingual Captions: Each caption is provided in both English and Chinese to cater to a broader audience and application.

  • Consistency and Clarity: The GPT ensures consistency in terminology and simplicity in descriptions to avoid confusion and maintain usability across various applications, such as image recognition and cataloging.

  • Generic Class Tags: Uses generic tags broadly applicable in image recognition or generation models, enhancing the utility of the captions in diverse applications.

Example Output Format

  • English version: "A woodcut print illustration of a traditional Chinese opera character in elaborate costume and white facial makeup stands holding a weapon, with two smaller figures in the background, full shot, static pose, set against a plain textured canvas, accompanied by Chinese script, historical storytelling."

  • Chinese version: "这是一幅木刻版画插图,画的是一个中国传统戏曲人物,穿着精致的服装,面着白色的妆容,手持武器,背景是两个较小的人物,全景,静态姿势,背景是一块朴素的纹理画布,配上中文文字,讲述历史故事."


输出结果如下

ree

 
 
 

Recent Posts

See All
20250702 Supervision Meeting

Here is the Speech draft. P1. As you can see, here is the refined custom workflow for AI-integrated digital woodblock print animation,...

 
 
 

Comments


© Powered by Zicheng Zhang

bottom of page