Smallem — Details

Here you'll find the full explanations for each of the tasks I perform. Every entry on the main page corresponds to a section below. Links in my edit summaries point directly to the relevant section.

// CITATION CONTENT

🌐 Language

The language parameter only accepts standard ISO codes or names recognized by MediaWiki. Values like |language=Inglese or |language=al (incorrect code for Albanian) cause articles to be categorized under CS1 maint: unrecognized language. I convert language values to their proper ISO codes — values recognized by every Wiki project in any language.

Some editors try to add language names directly in Albanian (|gjuha=, |gjuhe=, |gjuhë=), but the citation module does not accept parameters in the local language. This causes categorization under CS1 errors: unsupported parameter. I convert them to the correct form.

I also normalize the variant |lang= — a synonym of |language= accepted by the module — treating it identically to |language=.

📅 Date

The citation module performs a basic translation of month names to the local language. In many cases this suffices, but in certain cases it deviates from Albanian orthography — which is why articles with auto-translated dates are categorized under CS1 maint: date auto-translated. I perform the full adaptation of all date types from English to Albanian, including abbreviated months (Jan → janar, Mar → mars). Besides |date=, I also normalize |orig-date= — the original publication date for republished works.

Dates written in MDY order (janar 15, 2020) are reformatted to DMY (15 janar 2020), the standard Albanian format.

Date ranges require the en dash – not the hyphen -. Incorrect usage triggers categorization under CS1 maint: date format; I make the necessary replacement. Abbreviated ranges (1990–98) are expanded to full form (1990–1998), but only when the ending is unambiguous and within the same century.

Leading zeros on day numbers (01 janar 2020 instead of 1 janar 2020) trigger a CS1 error; I remove them.

When a full date (day, month, year) is present but |year= is used instead of |date=, I make the necessary change.

When both |date= and |year= are present simultaneously and the year in |year= repeats the one in |date=, the |year= parameter is redundant — I remove it, preventing categorization under CS1 errors: redundant parameter. When the values differ, the situation requires human judgment — I leave both parameters as they are.

✂️ Extra text

Identifier parameters have strict canonical formats. Adding extra prefixes to their values triggers CS1 errors (or maintenance categorization in some cases).

Explanatory text in volume, issue, edition, pages: Setting Vol. 3 instead of just 3, No. 5 instead of 5, 3rd ed. instead of 3rd, pp. 10–15 instead of 10–15 triggers categorization under the respective CS1 error categories. I remove these extras, leaving only the core value.

The Albanian abbreviation fq. in |pages= and the suffix (ed.) / (eds.) after editor names are treated the same way.

🔣 Special characters

Templates {{ndash}}, {{endash}}, {{en dash}} and {{en-dash}} produce the en dash –; templates {{mdash}}, {{emdash}}, {{em dash}} and {{em-dash}} produce the em dash —. Inside citations there's no need for these templates — I replace them directly with the actual characters.

HTML entities such as ", ', –, —,  , …, «, », typographic quotes, etc. are replaced with their real characters. ASCII entities (", ') are safe everywhere within a citation; typographic entities are replaced only in parameters that don't hold URLs — this care prevents breakage in URL parameter values.

The {{URL}} template is intended for use in infoboxes, not inside citations. When I find |url={{URL|http://...}} or similar variants, I unwrap the template and leave only the bare address that CS1 expects.

👻 Invisible characters

Certain Unicode control characters — invisible to the naked eye but harmful to rendering — I remove: C0 characters (U+0001 through U+001F with some exceptions), DEL (U+007F), C1 characters (U+0080 through U+009F), zero-width space (U+200B) and the replacement character (U+FFFD). Their presence in citations triggers categorization under CS1 errors: invisible character.

Special whitespace — TAB (U+0009), LF (U+000A), CR (U+000D), non-breaking space (U+00A0), hair space (U+200A) — I replace with plain spaces. These cases often involve vertically formatted citations where each parameter sits on its own line; after replacement, the formatting returns to the standard horizontal layout.

The soft hyphen (U+00AD) is replaced with a regular hyphen -.

// PARAMETER STRUCTURE

♻️ Deprecated parameters

The citation module is always evolving and over time some parameters begin to be considered deprecated. Using them triggers categorization under CS1 errors: deprecated parameters. The full list can be found at Module:Citation/CS1/Suggestions, under the comments old parameter name. I update them with their current replacements.

The value |ref=harv is no longer supported; the parameter is removed entirely when found with this value.

⬜ Empty parameters

Parameters with empty values (|param1=|param2=|) serve no function — I remove them.

Pipe characters serve to start a parameter; placing two consecutive pipes || or leaving a pipe at the end of a citation |}} constitutes an empty parameter and triggers categorization under CS1 errors: empty citation. I remove the extra pipe to fix the error.

🔗 Wikilinks in parameters

Parameters such as |author-link=, |editor-link=, |title-link=, |series-link=, |episode-link= and similar should contain only the plain article title on Wikipedia — never wikitext links in the form [[...]]. When an editor writes |author-link=[[John Smith]] instead of |author-link=John Smith, the module flags it as an error. I fix it by removing the brackets and keeping only the intended link target; in the case of piped links [[Target|Label]], I keep only the part before the pipe, i.e. the actual article title.

🌍 Website address

The |website= parameter is a synonym of |work= and expects the site name as a source (e.g. |website=Reuters), not its address. When editors put a URL as the value (e.g. |website=https://www.bbc.com), they're clearly confusing it with |url=. I fix it by renaming: when the value starts with http:// or https://, the parameter becomes |url= with the same value. Simple values like |website=BBC News are not touched. When |url= already exists in the citation, I leave everything untouched — the situation requires human judgment.

🚦 URL status

When the IABot archives sources, it sometimes cannot determine the status of the original address and leaves the parameter with the value |url-status=bot: unknown. I resolve it: when |archive-url= is present and not empty, I change the value to dead — assuming that the archive exists precisely because the original address has gone down; when there is no archive URL, I delete the parameter entirely.

Invalid values |url-status=yes and |url-status=true are changed to dead; |url-status=no and |url-status=false are changed to live. Values written with capital letters (Dead, Live, Unfit etc.) are normalized to lowercase — the module accepts them, but uniformity keeps the source code clean.

📦 Archive without URL

The parameters |archive-date= and |archive-format= have no meaning without the presence of |archive-url= — setting them triggers categorization under the CS1 archive URL error category. Since there is no valid case where these parameters can exist without the archive URL, I remove them when it is missing or empty.

⚓ Dependent parameters

Some CS1 parameters only have meaning when another specific parameter is present and not empty. When that required partner is missing, the dependent parameter serves no purpose and triggers CS1 errors.

🩹 Dead links

The {{dead link}} template (or variants {{dead-link}}, {{dead url}}) is a marker that editors place after a citation to indicate the address is no longer reachable. The canonical form is |url-status=dead inside the citation itself. When I find a {{dead link}} template immediately after a citation that has |url= but no |url-status=, I add |url-status=dead inside the citation and remove the external template — same meaning, less duplication.

📐 Formatting

I standardize citation template formatting to the compact form: no extra spaces around pipe characters |, around equals signs = or before the closing }}; multiple spaces within values are consolidated into a single one. Letter casing in the template name is preserved as-is.

Per MOS:REFPUNCT, <ref> tags should be attached directly to the preceding word without a space; also there should be no spaces between consecutive <ref> tags or within the reference content. I clean them up accordingly.

The same formatting treatment is applied to the short-form templates {{sfn}} and {{harv}} (including variants {{sfnp}}, {{sfnm}}, {{harvnb}}, {{harvtxt}}, {{harvp}}).

// SOURCE ENRICHMENT

✨ Enrichment

Beyond fixes, I also perform citation enrichment: when a citation contains a certain identifier, I query the relevant public services for other related identifiers and add them to the citation. Each enrichment fill is logged with a distinct label in the edit summary, by type of identifier added.

PMID → DOI, PMC (via PubMed): when the citation has |pmid=, I look up the corresponding DOI and PMC from PubMed's identifier cross-reference list and add them.

OCLC → ISBN (via Open Library): when the citation has |oclc=, I retrieve the ISBN from the corresponding Open Library record and add it.

DOI → open-access URL (via Fatcat): when the citation has |doi=, I search for a free copy in Fatcat (Internet Archive Scholar) and add it as |url= — giving the reader a toll-free path to the paper.

// PAGE CONTEXT

🗑️ Empty references

Using reference tags with no content inside — forms like <ref></ref>, <ref>{{cite web}}</ref> or <ref>{{Cite web|}}</ref> — creates references that artificially inflate the source count without actually containing any content. I remove them. Named references (<ref name="x">...</ref>) are never touched, even when empty — because they may be called from elsewhere in the article as <ref name="x" /> and removing them would break those calls.

⛓️ Duplicate references

Consecutive repeated references just add duplicate numbering in the text with no added value — I remove the repetitions. If references are only partially identical (they cite the same source but with different wording or whitespace), they are not touched — that assessment requires human judgment.

✒️ Reference quotes

<ref> tags accept unquoted attribute values (<ref name=foo>) for simple alphanumeric values. However, missing quotes cause silent breakage when the value contains spaces or special characters. I add the quotes: <ref name=foo> → <ref name="foo">. Covers all reference attributes that MediaWiki recognizes: name, group, follow, extends and dir.

📑 Reference headings

Different editors use different section headings for the references section. Some title it "Referenca", "Referencë", "Citime", others "Burime". The term "Referenca" is a calque from the English "References" and does not exist in Albanian dictionaries. The other two terms exist but to maintain a single standard, I convert all heading variants to "Referime". The heading level (== or ===) is preserved unchanged.

📂 Manual categories

CS1 categories serve to fix citation problems, to track certain features, or for statistical purposes. For these goals to work, categories should only be added automatically by the citation templates — not by hand. When an editor adds them manually ([[Category:CS1 errors: ...]]), it skews statistical accuracy and leaves the article in the category even after the error has been fixed. I remove manually added categories.

// MICROSERVICES

🛟 Article Wizard

Articles created with the Article Wizard come with a set of instructional comments placed in HTML ( or the formal variant ) and the {{NIA}} template at the end. These are meant to be removed by the editor before publishing, but are often forgotten.

I remove both variants of the instructional comment (singular and plural) as well as the {{NIA}} template. Other HTML comments in the article (that don't contain the specific Article Wizard phrase) are never touched — only those I recognize as forgotten instructions.

🔢 Transclusion count

Module:Transclusion count shows how many times each template or module is used on the platform. To function, this module needs a bot that reads the database and writes the numbers to the /data/X subpages. I take this on.

I query the Wikimedia analytics database for templates (namespace 10) and modules (namespace 828) used more than 2,000 times. Numbers are rounded to two significant figures (or three for counts over 100,000). Each title is classified by its first letter into the corresponding alphabetical category — for Albanian this includes digraphs (Dh, Gj, Ll, Nj, Rr, Sh, Th, Xh, Zh); titles that don't start with any index letter go to the "Other" category.

// VERSION HISTORY

v1 — set of regular expressions tied to user-fixes.py for textual replacements.

v2 — structured citation analysis and much broader CS1 error coverage.

v3 — citation enrichment via API calls to public services (PubMed, Open Library, Fatcat).