Long time coder here with a STEM degree, but not a CS degree. I've mostly used programming as a tool to help me with my job rather than as my job, so I'm conscious that I have some gaps in my programming skillset.

To close a couple of those gaps, I'm trying become competent at Github and take my Python skills to the next level.

If you kind people could provide suggestions on improvements I could make to this repo and the code in it I'd be ever so grateful. :)

It's a bot to run Lemmy posts and comments through the pretrained Detoxify transformer model and to report toxic comments for the Mods to action.

top 5 comments

sorted by: hot top controversial new old

[–] UlrikHD 3 points 1 year ago* (last edited 1 year ago) (1 children)

Obviously this is opinionated and I won't pretend it's the only correct way, but a few things that stood out to me was.

inconsistent use of type hinting. You type hint the "elem" arg for process_content and nothing else. Personally I use type hints religiously, but at the very least I would type hint every arg. The type may be obvious to you now, but it may not in 6 months, or for others who want to contribute.
while on the topics of type hints, you use "#" to comment the purpose of each function, but you really should use docstrings instead. Text editors supporting python will then use the docstrings to show users the description of each function without you having to jump to the declaration to read the description. It's particularly useful when you got multiple modules. For some IDEs like pycharm, the same format works on variables too.
You should wrap up your bottom infinite loop in if __name__ == '__main__': to avoid getting locked if you down the line want to reuse the class/module and import it into another file.

And the most opinionated point of them all:

I would recommend running a linter like pylint to warn about potential code smells. E.g. you're redefining the python built-in "id", no exception types are specified in your try blocks, too many branches and statements in process_content() which would probably benefit from being segmented into smaller functions, lines that are twice as long as the recommended length, wrong import order, etc... (these are purely pylint feedback)

I assume the setup is the same with GitHub's ci, but with GitLab you can automate pylint to check the the code with this:

  image: python:3.10
  script:
    - pip install pylint
    - pylint *folder*```

[–] [email protected] 2 points 1 year ago

Thank you! I'll go through your suggestions.

[–] [email protected] 1 points 1 year ago

PS: the current version of LemmyModBot is using an as-yet unmerged version of pylemmy, so I'm not sure if it will work on a fresh install right now.

[–] oscar 1 points 1 year ago* (last edited 1 year ago) (1 children)

I haven't gone though it in detail but something that stood out to me is the complexity of process_content().

If you at some point end up with a large function, or if you have deeply nested blocks, it can help readability to split it up into smaller functions with more clear goals, even if they are only called once. In your case you could keep process_content() as a sort of parent function for calling smaller ones.

I'm guilty of large functions too because it's easier to just add stuff to a single function while developing and debugging, but before I submit stuff I tend to go through and clean up by doing this.

Though I guess this is sort of opinionated too!

[–] [email protected] 2 points 1 year ago

Good point and thank you. My functions do tend to evolve into unwieldy messes.