2/25-27-28| 3-03 LoRA与caption分批测试
- issac zhang

- Feb 25
- 6 min read
Updated: Mar 4
时间表
3.3 最后的universal stylelora测试(不改了!) | 3.4 字体lora | 3.5角色lora | 3.6 道具lora | 3.7lora修改 | 3.8故事板 | 休息-徒步 |
3.10故事板确定(跟项目一样) | 3.11ccomfyui,多lora,controlnet测试 | 3.12 styleframe | 3.13styleframe | 3.14styleframe(只做一个镜头) | 3.15 栅格AE动画???或者其他字体动画?? | 休息 |
AE字体动画?? | AI人角色动画 | AI角色动画 | AI角色动画 | 激光打印 | 激光打印 | 激光打印(成品) |
LORA测试:
测试视频教程参数

效果如下:

用了这个数据集训练出来的效果并不是很理想,比如Epoch 20中采用prompt“zzcstylet1,a Taoist boy wearing a traditional Chinese robe”出来的效果勉强可以模仿了一部分的训练集风格,但是左一的的图并没有被提示词触发,反而是中二图的效果很像。综上而言,这次训练并没有完全学习到数据集的风格,可能还是需要继续改进提示词和config。

测试印度人参数
然后我也顺便测试了印度人的flux config,看看对比效果如何。结果是完全没有学习到任何的风格。黄峻老师说是学习率的原因。

测试黄俊老师的config
对caption进行了Gpts和手工的结合优化,然后用了黄老师的config文件进行进一步的测试。看看后续效果,再进行下一步的其他concept lora训练。
效果跟网络教程差不多,风格继承的效果一般。对于原caption的图能够做到基本的还原,但是好像对于训练集之外的风格画面衍生性很差。画面干净的线条没有学到.还是得问老师。


(03.03)试试每张图在一个epoch中只出现一次,多跑几个epoch来累积步数。
试试不打标的lora?或者仅仅打简单标的lora???
用AI可以提高图片的清晰度,但是...会损失很多细节
The modified version 2 according to HuangJun's Advice
I revised all of the captions based on Huangjun's advice, as he claims that the lines were too thin for the AI to learn the dataset effectively. This process took me the entire night to complete 95 images and its captions.
The example of the previous version versus the example of the last version.
Old caption:BBCCC style. A BBCCC Chinese woodblock print of a shirtless man with a mustache and a small beard. He wears a headscarf and loose-fitting pants tied at the waist with a sash. His upper body is bare, revealing his chest and abdomen. His arms are extended outward, and his stance suggests movement, possibly a dance or martial arts pose. The background is a warm beige color, giving the image an aged, historical feel.
new caption:BBCCC style. A BBCCC Chinese woodblock print of a shirtless man with a mustache and a small beard. He wears a headscarf and loose-fitting pants tied at the waist with a sash. His upper body is bare, revealing his chest and abdomen. His arms are extended outward, and his stance suggests movement, possibly a dance or martial arts pose.The BBCCC Chinese woodblock print features precise lines blended with subtle shading to enhance its texture. This highlights the simplicity and purity of the artistic style, evoking a sense of nostalgia.
Result: The output result is not good at all; Lora only learned the style of the line but can not learn any sense of design and aesthetics.


Caption测试
Comparison between Joycaption with Civitai Autolable

我发现了Autolabel和Joycaption的结果各有其侧重点,比如在上个例子中的两个prompt中,joycaption就比较注重object本身细节的描述,而autolable则是注重整体性的描述,还有身体的动作。
整个的prompt经过上网查阅一些文章(1,2)和我之前的笔记之后,有些人推荐不使用caption,有些人直接使用自动打标软件之后进行手动修改。说法是不一的,没有一个统一的标准。但是有一个共识就是:训练集的图片的质量远比caption和训练参数要重要,其次区分图片的媒介也很重要,比如是照片还是drawing,又或者是painting。其次,对于风格训练,反而不要太过于专注于描述画风本身,而是画面本身。

Flux style sample image and caption 
Civitai Autoalble
2.测试人工修改的tag and modify it
caption processed by Joycaption still has many artefacts, so I still need to modify it manually by using my personal GPTS after I update it.
Updated Prompt:
I am a GPT designed to generate detailed and structured captions for multiple images in a single session, following the structured format: Prefix + Scene + Suffix + Parameter. My goal is to provide highly descriptive and visually rich captions in both English and Chinese, ensuring clarity and consistency for AI model training, dataset annotation, and creative applications.
### Caption Structure:
1. Prefix (媒介类型) - Defines the medium of the image (e.g., "a photorealistic 3D render of", "a portrait photo of", "an ink illustration of").
2. Scene (主体 & 场景) - Provides detailed descriptions of the main subject and its environment, adding intricate details to enhance visual richness.
3. Suffix (氛围 & 细节) - Describes mood, lighting, composition, textures, and stylistic elements for added depth.
4. Parameter (技术参数) - Includes technical specifications like camera settings, rendering engines, or LUTs, if applicable.
### Example Output:
English:
"a photorealistic 3D render of, a futuristic humanoid robot standing in a neon-lit cyberpunk city, its sleek metallic body adorned with intricate circuit patterns glowing faintly, its eyes emitting a cold blue light, surrounded by towering skyscrapers covered in animated holographic advertisements, rain-soaked streets reflecting the pulsating neon glow, dense fog rolling in from the distance, vibrant neon blues and purples, mysterious and high-tech, dramatic lighting with strong rim lights accentuating the contours of the robot's form, urban nightscape with heavy fog, rendered using Unreal Engine 5, ray-traced reflections, soft shadows."
中文:
"写实的 3D 渲染, 霓虹灯照耀下的赛博朋克城市中站立的未来人形机器人, 光滑的金属身体上雕刻着精密的电路纹理并微微发光, 眼睛散发出冰冷的蓝色光芒, 四周高耸入云的摩天大楼上覆盖着动态全息广告, 被雨水浸湿的街道映射出跳动的霓虹光辉, 远处浓雾缓缓弥漫, 鲜艳的霓虹蓝色和紫色, 神秘且高科技, 戏剧性的灯光与强烈的轮廓光勾勒出机器人的轮廓, 充满浓雾的城市夜景, 渲染引擎 Unreal Engine 5, 光线追踪反射, 柔和阴影."
I focus on delivering structured, highly detailed bilingual captions that emphasize key visual elements, artistic styles, and technical aspects.
最后处理完的95张图片已经上传到civitai,遵循了prefix-scene-suffix-parameter的范式,并且用了GPTs进行进一步的微调。
Video 测试
Traditional Chinese Gongbi painting style, inspired by woodblock prints. A mythical tiger-like creature with a long, flexible neck and multiple faces. The creature moves its four limbs smoothly, shifting expressions subtly. It tilts its head left and right, creating a fluid, mesmerizing effect. The animation lasts 4 seconds, capturing the intricate ink details and delicate brushwork. Subtle textures mimic aged paper, giving the scene an ancient artistic atmosphere
明显wan的效果更好!
Refelction:
1.貌似flux可以用defourm的方法去做,而不是直接用kling,这样应该是有操控性的。
不过后面看了,这个效果不太好。每一张动画帧的衔接不行,只能做一些不太可控的三维空间变化。
2.本地的打标软件都不太行,速度太慢了。不如civitai的自动打标功能。可能合适的工作流是civitai加上gemini。 (;´༎ຶД༎ຶ`) 测试了一晚上的结论就是:不行。








Comments